Types of regression models Regression Models Simple 1° order Multiple 1° order 2° order Higher order Interaction 2° order Higher order A quadratic second order model E(Y)=β0+ β1x+ β2 x2 • Interpretation of model parameters: • β0: y-intercept. The value of E(Y) when x1 = x2 = 0 • β1 : is the shift parameter; • β2 : is the rate of curvature; Example with quadratic terms The true model, supposedly unknown, is Yi 100.00 = 2 + xi2 + εi, with εi~N(0,2) 75.00 y 50.00 25.00 0.00 2.00 Data: (x,y). See SQM.sav 4.00 6.00 x 8.00 10.00 Model 1: E(Y) = β0 + β1x Model Summary Model 1 R ,973a R Square ,947 a. Predictors: (Constant), x Model 1 Regres sion Residual Total Sum of Squares 80624, 915 4500,202 85125, 117 a. Predic tors: (Constant), x Adjusted R Square ,947 ANOVAb df 1 103 104 (Constant) x B -19,959 10,744 a. Dependent Variable: y Mean Square 80624, 915 43,691 F 1845,332 Sig. ,000a Coefficientsa b. Dependent Variable: y Unstandardized Coeffic ients Model 1 Std. Error of the Estimate 6,60994 Std. Error 1,483 ,250 Standardiz ed Coeffic ients Beta ,973 t -13,454 42,957 Sig. ,000 ,000 Linear Regression 100.00 y = -19.96 + 10.74 * x R-Square = 0.95 75.00 y 50.00 25.00 0.00 2.00 4.00 6.00 x 8.00 10.00 Model 2: E(Y) = β0 + β1x2 Model Summary Model 1 R ,996a R Square ,991 Adjusted R Square ,991 a. Predictors: (Constant), XSquare Model 1 Regres sion Residual Total Std. Error of the Estimate 2,68707 Smaller variance and SE ANOVAb Sum of Squares 84381, 422 743,695 85125, 117 df 1 103 104 Mean Square F 84381, 422 11686, 632 7,220 Sig. ,000a a. Predic tors: (Constant), XSquare b. Dependent Variable: y Coeffi cientsa Model 1 (Const ant) XSquare Unstandardized Coeffic ients B St d. Error 2,340 ,417 ,997 ,009 a. Dependent Variable: y St andardiz ed Coeffic ients Beta ,996 t 5,608 108,105 Sig. ,000 ,000 Linear Regression 100.00 y = 2.34 + 1.00 * XSquare R-Square = 0.99 y 50.00 0.00 75.00 25.00 0.00 25.00 50.00 XSquare 75.00 100.00 Model 3: E(Y) = β0 + β1x + β2x2 Model Summary Model 1 R .996a R Square .991 Adjusted R Square .991 a. Predictors: (Constant), XSquare, x Model 1 Regres sion Residual Total Std. Error of the Estimate 2.66608 ANOVAb Sum of Squares 84400. 103 725.014 85125. 117 df 2 102 104 Mean Square 42200. 052 7.108 F 5936.999 Sig. .000a a. Predic tors: (Constant), XSquare, x b. Dependent Variable: y Model 1 (Const ant) x XSquare Coeffi cientsa Unstandardized Coeffic ients B St d. Error 4.177 1.206 -.830 .512 1.071 .046 a. Dependent Variable: y St andardiz ed Coeffic ients Beta -.075 1.069 t 3.463 -1. 621 23.046 Sig. .001 .108 .000 Types of regression models Regression Models Simple 1° order Multiple 1° order 2° order Higher order Interaction 2° order Higher order A third order model with 1 IV E(Y)=β0+ β1x+ β2 x2+ β3 x3 Use with caution given numerical problems that could arise Y >0 3 Y X1 <0 3 X1 Types of regression models Regression Models Simple 1° order Multiple 1° order 2° order Higher order Interaction 2° order Higher order First-Order model in k Quantitative variables E(Y)=β0+β1x1+β2 x2 + ... + βk xk Interpretation of model parameters: β0: y-intercept. The value of E(Y) when x1 = x2 =...= xk= 0 β1: change in E(Y) for a 1-unit increase in x1 when x2,.., xk are held fixed; β2: change in E(Y) for a 1-unit increase in x2 when x1, x3,..., xk are held fixed; ... A bivariate model E(Y)=β0+β1x1+β2 x2 Changing x2 changes only the y-intercept. In the first order model a 1-unit change in one independent variable will have the same effect on the mean value of y regardless of the other independent variables. A bivariate model Y Response Plane X1 Yi = 0 + 1X1i + 2X2i + i (Observed Y) 0 i X2 (X1i,X2i) E(Y) = 0 + 1X1i + 2X2i Example: executive salaries • • • • • • Y = Annual salary (in dollars) x1 = Years of experience x2 = Years of education x3 = Gender : 1 if male; 0 if female x4 = Number of employees supervised x5 = Corporate assets (in millions of dollars) E(Y)=β0+ β1x1+ β2 x2 + β4 x4 + β5 x5 Data: ExecSal.sav Do not consider x3 (Gender) for the moment Exsecutive salaries: Computer Output Riepilogo del modello Modello R-quadrato R R-quadrato corretto ,870a ,757 ,747 Deviazione standard Errore della stima 12685,309 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised Simple regression Multiple regression Riepilogo del modello Modello R R-quadrato R-quadrato corretto Deviazione standard Errore della stima 1 dimension0 ,783a ,613 . Predittori: (Costante), Years of Experience a ,609 15760,006 Coefficient of determination The coefficient R2 is computed exactly as in the simple regression case. R2 Explained variation SSR SSE 1 Total variation SST SST n n ( yi y ) 2 i 1 SST (Total) n ( yˆ i y ) 2 i 1 SSR (Regression) ( yi yˆ i ) 2 i 1 SSE (Error) A drawback of R2: it increases with the number of added variables, even if these are NOT relevant to the problem. Adjusted R2 and estimate of the variance σ2 A solution: Adjusted R2 – Each additional variable reduces adjusted R2, unless SSE varies enough to compensate Ra2 n 1 SSE SSE 2 1 1 R SST n k 1 SST An unbiased estimator of the variance σ 2 is computed as 2 ˆi SSE s n k 1 n k 1 2 Exsecutive salaries: Computer Output (2) Coefficientia Model Coefficienti non standardizzati Variables 1 B (Costante) Years of Experience Years of Education Number of Employees supervised Corporate assets (in million $) Deviazione standard Errore Coefficienti standardizz ati Beta T-tests t Sig. -37082,148 17052,089 -2,175 ,032 2696,360 173,647 ,785 15,528 ,000 2656,017 563,476 ,243 4,714 ,000 41,092 7,807 ,272 5,264 ,000 244,569 83,420 ,149 2,932 ,004 Variabile dipendente: Annual salary in $ Testing overall significance: the F-test • 1. Shows If There Is a Linear Relationship Between All X Variables Together & Y • 2. Uses F Test Statistic • 3. Hypotheses – H0: 1 = 2 = ... = k = 0 • No Linear Relationship – Ha: At Least One Coefficient Is Not 0 • At Least One X Variable Affects Y The F-test for 1 single coefficient is equivalent to the t-test Anova table F-statistic Anovab Modello 1 Somma dei quadrati Media dei quadrati df Regressione 4,766E10 Residuo 1,529E10 95 Totale 6,295E10 99 F 4 1,192E10 74,045 Sig. ,000a 1,609E8 . Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Number of Employees supervised a df = k: number of b. Variabile dipendente: Annual salary in $ regression slopes p-vale of F-test df = n-1: n= number of observations MSE (mean square error), the estimate of variance Decision: reject H0, i.e. accept this model Interaction (second order) model E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2 • Interpretation of model parameters: • β0: y-intercept. The value of E(Y) when x1 = x2 = 0 • β1+ β3 x2 : change in E(Y) for a 1-unit increase in x1 when x2 is held fixed; • β2 + β3 x1 : change in E(Y) for a 1-unit increase in x2 when x1 is held fixed; • β3: controls the rate of change of the surface. Interaction (second order) model E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2 Contour lines are not parallel The effect of one variable depends on the level of the other Example: Antique grandfather clocks auction Clocks are sold at an auction on competitive offers. Data are: – Y : auction price in dollars – X1: age of clocks – X2: number of bidders Model 1: E(Y) = β0 + β1x1 + β2x2 Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 Data: GFCLOCKS.sav Data summaries Descriptive Statistics Minimu Maximu Mean Std. N Skewness Kurtosis m m Deviatio Statistic Std. Error Statistic Std. Error Statistic Statistic Statistic Statistic Statistic n Age 32 108 194 144.94 27.395 .216 .414 -1.323 .809 Bidders 32 5 15 9.53 2.840 .420 .414 -.788 .809 Price 32 729 2131 1326.88 393.487 .396 .414 -.727 .809 Valid N (listwise) 32 If data are Normal Skewness is 0 If data are Normal (eccess) Kurtosis is 0 Note: Skewness and Kurtosis are not enough to establish Normality P-P plot for Normality If data are Normal. Points should be along the straight line. In this example the situation is fairly good Bivariate scatter-plots 2000 2000 Price 1200 1600 1200 800 Price 1600 800 120 140 160 Age 180 6 8 10 Bidders 12 14 Model 1: E(Y) = β0 + β1x1 + β2x2 Model Summary Model 1 R R Square a .945 .892 Adjusted R Square .885 Std. Error of the Estimate 133.485 a. Predictors: (Constant), Bidders, Age ANOVAb Model 1 Regres sion Residual Total Sum of Squares 4283062.960 516726.540 4799789.500 df 2 29 31 Mean Square 2141531.480 17818. 157 a. Predic tors : (Constant), Bidders , Age Coeffi cientsa b. Dependent Variable: Price Unstandardized St andardiz ed Coeffic ients Coeffic ients Model B St d. Error Beta 1 (Const ant) -1338.951 173.809 Age 12.741 .905 .887 Bidders 85.953 8.729 .620 a. Dependent Variable: Price F 120.188 t -7. 704 14.082 9.847 Sig. .000a Sig. .000 .000 .000 Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 Model Summary Model 1 R R Square a .977 .954 Adjusted R Square .949 Std. Error of the Estimate 88.915 a. Predictors: (Constant), AgeBid, Age, Bidders ANOVAb Model 1 Regres sion Residual Total Sum of Squares 4578427.367 221362.133 4799789.500 df 3 28 31 Mean Square 1526142.456 7905.790 F 193.041 Sig. .000a t 1.086 .432 -3.120 6.112 Sig. .287 .669 .004 .000 a. Predic tors : (Constant), AgeBid, Age, Bidders a b. Dependent Variable: Price Coefficients Model 1 (Constant) Age Bidders AgeBid Unstandardized Coefficients B Std. Error 320.458 295.141 .878 2.032 -93.265 29.892 1.298 .212 a. Dependent Variable: Price Standardized Coefficients Beta .061 -.673 1.369 Interpreting interaction models The coefficient for the interaction term is significant. If an interaction term is present then also the corresponding first order terms need to be included to correctly interpret the model. In the example an uncareful analyst could estimate the effect of Bidders as negative, since b2=-93.26 Since an interaction term is present, the slope estimate for Bidders (x2) is b2 + b3x1 Note: b = ^β For x1= 150 (age) the estimated slope for Bidders is -93.26 + 1.3 (150) = 101.74 Models with qualitative X’s Regression models can also include qualitative (or categorical) independent variables (QIV). The categories of a QIV are called levels Since the levels of a QIV are not measured on a natural numerical scale in order to avoid introducing fictitious linear relations in the model we need to use a specific type of coding. Coding is done by using IV which assume only two values: 0 or 1. These coded IV are called dummy variables Models with QIV • Suppose we want to model Income (Y) as a function of Sex (x) -> use coded, or dummy, variables x = 1 if Male, x = 0 if Female E(Y) = β0+ β1x E(Y) = β0+ β1 if x =1, i.e. Male E(Y) = β0 if x =0, i.e. Female β0 is the base level, i.e Female is the reference category β1 is the additional effect if Male In this simple model, only the means for the two groups are modeled QIV with q levels As a general rule, if a QIV has q levels we need q-1 dummies for coding. The uncoded level is the reference one. Example: a QIV has three levels, A, B and C Define x1 = 1 level A, x1 = 0 if not x2 = 1 level B, x2 = 0 if not Model: E(Y) = β0+ β1x1 + β2x2 C is the reference level Interpreting β’s β0 = μC (mean for base level C) β1 = μA - μC (additional effect wrt C if level A) β2 = μB - μC (additional effect wrt C if level B) Models with dummies Even if models which consider only dummy variables do in practice estimate the means of various groups, the testing machinery of the regression setup can be useful for group comparisons. Dummies can be used in combination with any other dummies and quantitative X’s to construct models with first order effects (or main effects) and interactions to test hypotheses of interest. In order to define dummies in SPSS see “Computing dummy vars in SPSS.ppt” Example: executive salaries A managing consulting firms has developed a regression model in order to analyze executive’s salary structure • • • • • • Y x1 x2 x3 x4 x5 = Annual salary (in dollars) = Years of experience = Years of education = Gender : 1 if male; 0 if female = Number of employees supervised = Corporate assets (in millions of dollars) Data: ExecSal.sav A simple model: E(Y) = β0 + β3x3 Male group Female group This model estimates the means of the two groups (M,F) We wanto to test if the difference in means is significant, i.e. not due to chance Regression Output Model Summary Model 1 R R Square a .392 .153 Adjusted R Square .145 a. Predictors: (Constant), Gender Model 1 Regres sion Residual Total ANOVAb Sum of Squares 9651865066.845 53295882433.156 62947747500.001 df 1 98 99 a. Predictors: (Constant), Gender b. Dependent Variable: Annual s alary in $ Model 1 (Const ant) Gender Salary difference between groups is significant Std. Error of the Estimate 23320.282 Unstandardized Coeffic ients B St d. Error 83847. 059 3999.395 20739. 305 4922.915 Mean Square 9651865066.845 543835535.032 F 17.748 Sig. .000a Coeffi cientsa St andardiz ed Coeffic ients Beta .392 t 20.965 4.213 Sig. .000 .000 95% Confidenc e Int erval for B Lower Bound Upper Bound 75910. 389 91783. 729 10969. 940 30508. 670 a. Dependent Variable: Annual salary in $ Mean increment for Male C.I. for mean increment Model 2: E(Y) = β0 + β1x1 + β3x3 It seems that the two groups are separated Model 2 considers same slope but different intercepts If x3 = 0 (female) then E(Y) = β0 + β1x1 If x3 = 1 (male) then E(Y) = β0 + β3 + β1x1 Computer output for model 2 R square improved greatly Model Summary Model 1 R R Square a .860 .740 Adjusted R Square .735 Std. Error of the Estimate 12981.615 a. Predictors: (Constant), Years of Experience, Gender b ANOVA Model 1 Regres sion Residual Total Sum of Squares 46601081714.527 16346665785.474 62947747500.001 df 2 97 99 Mean Square 23300540857.264 168522327.685 F 138.264 Sig. .000a a. Predictors: (Constant), Years of Experience, Gender a b. Dependent Variable: Annual salary in $ Model 1 Coeffi cients Unstandardized Coeffic ients B St d. Error (Const ant) 50614. 312 3161.279 Gender 18894. 215 2743.253 Years of Experienc e 2633.831 177.875 St andardiz ed Coeffic ients Beta .357 .767 t 16.011 6.888 14.807 Sig. .000 .000 .000 95% Confidenc e Int erval for B Lower Bound Upper Bound 44340. 048 56888. 576 13449. 618 24338. 812 2280.799 2986.863 a. Dependent Variable: Annual salary in $ New intercept for Male is significant In this model effect of experience is assumed equal for the two groups Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3 With this model we want to test whether gender and experience interacts, i.e. if male salary tend to grow at a faster (slower) rate with experience. If x3 = 0 (female) then E(Y) = β0 + β1x1 If x3 = 1 (male) then E(Y) = (β0 + β3) + (β1 + β4)x1 New intercept for male New slope for male Remark: running regression for the two groups together allows to have higher degrees of freedom (n) for estimating parameters and model variance. Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3 Model 3 considers different slope and different intercepts Computer output for model 3 Model Summary Model 1 R R Square .868a .754 Adjust ed R Square .746 St d. Error of the Es timate 12700. 080 There is evidence that salaries for the two groups grow at different rate with experience a. Predic tors : (Constant), ExpGender, Years of a Coefficients Ex perience, Gender Model 1 (Constant) Gender Years of Experience ExpGender Unstandardized Coefficients B Std. Error 58049.768 4461.179 7798.504 5497.470 2044.541 308.565 864.122 373.653 Standardized Coefficients Beta .147 .595 .301 t 13.012 1.419 6.626 2.313 Sig. .000 .159 .000 .023 95% Confidence Interval for B Lower Bound Upper Bound 49194.397 66905.139 -3113.888 18710.896 1432.045 2657.036 122.426 1605.818 a. Dependent Variable: Annual s alary in $ Estimated lines: ^ = 58049.8 + 2044.5*(Years of Experience) for female Y ^ = 65848.3 + 2908.7*(Years of Experience) for male Y A complete second order model E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+ β5 x22 • Interpretation of model parameters: • • • • β0: y-intercept. The value of E(Y) when x1 = x2 = 0 β1 and β2 : shifts along the x1 and x2 axes; β3 : rotation of the surface; β4 and β5 : controls the rate of curvature. Back to Executive salaries What about if suspect that rate of growth changes and has opposite signs for M and F? x1 = Years of experience x3 = Gender (1 if Male) Note: x32 = x3 since it is a dummy E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12 E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12+ β5 x3x12 Model 4 Model 5 Comparing Model 4 and 5 Model 4 If x3 = 0 (female) then E(Y) = β0 + β1x1 + β4x12 If x3 = 1 (male) then E(Y) = (β0 + β2) + (β1 + β3)x1 + β4x12 Model 5 Different intercept and slope for M and F but same curvature If x3 = 0 (female) then E(Y) = β0 + β1x1 + β4x12 If x3 = 1 (male) then E(Y) = (β0 + β2) + (β1 + β3)x1 + (β4+β5)x12 Different intercept, slope and curvature for M and F Model 5: computer output Riepilogo del modello Modello R dimension0 ,875a 1 R-quadrato corretto R-quadrato ,766 Deviazione standard Errore della stima ,754 12507,735 a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen Anovab Modello 1 Somma dei quadrati Media dei quadrati df Regressione 4,824E10 5 Residuo 1,471E10 94 Totale 6,295E10 99 a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen b. Variabile dipendente: Annual salary in $ F 9,648E9 61,673 1,564E8 Sig. ,000a Model 5: computer output Coefficientia Modello 1 Coefficienti non standardizzati B Deviazion e standard Errore Beta t Sig. (Costante) 52391,973 6497,971 8,063 ,000 Years of Experience Gender ExpGen ExpSqu Exp2Gen 3373,970 1165,248 ,982 2,895 ,005 21122,152 8285,802 -2081,897 1459,842 -53,181 45,001 112,836 54,950 ,399 -,724 -,422 ,904 2,549 -1,426 -1,182 2,053 a. Variabile dipendente: Annual salary in $ Which model is preferable? Model 3 or model 5? ,012 ,157 ,240 ,043 A test for comparing nested models Two models are nested if one model contains all the terms of the other model and at least one additional term. The more complex of the two models is called the complete (or full) model. The other is called the reduced (or restricted) model. Example: model 1 is nested in model 2 Model 1: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2 Model 2: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+ β5 x22 To compare the two models we are interested in testing H0: β4 = β5 = 0, vs. H1: at least one, β4 or β5, differs from 0 F-test for comparing nested models Reduced model: E(Y) = β0+ β1x1+ … + β2 xg Complete Model: E(Y) = β0+ β1x1+ … + β2 xg + βg+1 xg+1 + … + βkxk To test H0: βg+1 = … = βk = 0 H1: at least one of the parameters being tested is not 0 Compute ( SSER SSEC ) /( k g ) F MSEC Reject H0 when F > Fα, where Fα is the level α critical point of an F distribution with (k-g, n-(k+1)) d.f. F-test for nested models Where: SSER = Sum of squared errors for the reduced model; SSEC = Sum of squared errors for the complete model; MSEC = Mean square error for the complete model; Remark: k – g = number of parameters tested k +1 = number of parameters in the complete model n = total sample size Compute partial F-tests with SPSS 1. Enter your complete model in the Regression dialog box – choose the Method “Enter” 2. Click on “Next” 3. In the new box for Independent variables, enter those you want to remove (i.e. those you’d like to test) – choose the Method “Remove” 4. In the “Statistics” option select “R squared change” 5. Ok. Applying the F-test Let us use the F-test to compare Model 3 and Model 5 in the executive salaries example. Note that Model 3 is nested in Model 5 Model 3: E(Y) = β0 + β1x1 + β2x3 + β3x1x3 Model 5: E(Y) = β0 + β1x1 + β2x3 + β3x1x3 + β4x12 + β5x3x12 Apply the F-test for H0: β4 = β5 = 0 Computer output Variabili inserite/rimossec Modello Variabili Variabili inserite rimosse Metodo 1 . Exp2Gen, Per blocchi Gender, Years of Experience, ExpSqu, ExpGena 2 .a Exp2Gen, Rimuovi ExpSqub a. Tutte le variabili richieste sono state immesse. Do NOT reject H0: β4 = β5 = 0, i.e. Model 3 is better F-statistic F p-value b. Tutte le variabili richieste sono state rimosse. c. Variabile dipendente: Annual salary in $ Riepilogo del modello Variazione dell'adattamento Model R RDeviazione R- quadrat standard Variazione quadr di RVariazio o Errore della stima ato corretto quadrato ne di F df1 1 ,875° ,766 ,754 12507,735 2 ,868b ,754 ,746 12700,080 a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen b. Predittori: (Costante), Gender, Years of Experience, ExpGen ,766 61,673 -,012 2,488 df2 Sig. Variazio ne di F 5 94 ,000 2 94 ,089 A quadratic model example: Shipping costs Although a regional delivery service bases the charge for shipping a package on the package weight and distance shipped, its profit per package depends on the package size (volume of space it occupies) and the size and nature of the delivery truck. The company conducted a study to investigate the relationship between the cost of shipment and the variables that control the shipping charge: weight and distance. – Y : cost of shipment in dollars – X1: package weight in pounds – X2: distance shipped in miles It is suspected that non linear effect may be present Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 + β5x22 Data: Express.sav Scatter plots 16.0 16.0 12.0 12.0 Cost of shipment Cost of shipment 8.0 0.00 8.0 2.00 4.0 4.0 4.00 6.00 Weight of parcel in lbs. 8.00 50 100 150 200 250 Distance shipped Scatter plots in multiple regression often do not show too much information Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 + β5x22 Model Summ ary Model 1 R R Square .997a .994 Adjust ed R Square .992 St d. Error of the Es timate .4428 a. Predic tors: (Constant), Weight*Dist anc e, Distance b ANOVA squared, W eight squared, W eight of parcel in lbs., Distance s hipped Model 1 Regres sion Residual Total Sum of Squares 449.341 2.745 452.086 df 5 14 19 Mean Square 89.868 .196 F 458.388 Sig. .000a a. Predictors: (Constant), Weight*Dis tance, Distance s quared, Weight squared, Coeffi cientsa Weight of parcel in lbs., Distance s hipped b. Dependent Variable: Cos t of s hipment Unstandardized Coeffic ient s Model B St d. Error 1 (Const ant) .827 .702 W eight of parc el in lbs. -.609 .180 Distance s hipped .004 .008 W eight squared .090 .020 Distance s quared 1.51E-005 .000 W eight*Dis tanc e .007 .001 a. Dependent Variable: Cost of shipment St andardiz ed Coeffic ient s Beta -.316 .062 .382 .075 .850 t 1.178 -3. 386 .503 4.442 .672 11.495 Sig. .259 .004 .623 .001 .513 .000 Not significant, try to eliminate Distance squared Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 Model Summary Model 1 R .997a R Square .994 Adjusted R Square .992 Std. Error of the Estimate .4346 a. Predictors: (Constant), Weight*Distance, Distance b ANOVA shipped, Weight squared, Weight of parcel in lbs. Model 1 Regres sion Residual Total Sum of Squares 449.252 2.833 452.086 df 4 15 19 Mean Square 112.313 .189 F 594.623 Sig. .000a a. Predictors: (Constant), Weight*Dis tance, Distance s hipped, Weight squared, Coefficientsa Weight of parcel in lbs. b. Dependent Variable: Cos t of s hipment Unstandardized Model 1 (Constant) Weight of parcel in lbs. Distance shipped Weight squared Weight*Dis tance Coefficients B Std. Error .475 .458 -.578 .171 .009 .003 .087 .019 .007 .001 a. Dependent Variable: Cos t of shipment Standardized Coefficients Beta -.300 .141 .369 .842 t 1.035 -3.387 3.421 4.485 11.753 Sig. .317 .004 .004 .000 .000 Applying the F-test: Shipping costs A company conducted a study to investigate the relationship between the cost of shipment and the variables that control the shipping charge: weight and distance. – Y : cost of shipment in dollars – X1: package weight in pounds – X2: distance shipped in miles It is suspected that non linear effect may be present, use the F-test for nested models to decide between Model 1: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 + β5x22 Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 Data: Express.sav ANOVA Tables Full model ANOVAb Model 1 Regres sion Residual Total Sum of Squares 449.341 2.745 452.086 df 5 14 19 Mean Square 89.868 .196 F 458.388 Sig. .000a a. Predictors: (Constant), Weight*Dis tance, Distance s quared, Weight squared, Weight of parcel in lbs., Distance s hipped b. Dependent Variable: Cos t of s hipment Reduced model ANOVAb Model 1 Regres sion Residual Total Sum of Squares 445.452 6.633 452.086 df 3 16 19 Mean Square 148.484 .415 F 358.154 Sig. .000a a. Predic tors: (Constant), Dist ance shipped, W eight of parcel in lbs., Weight *Dist anc e b. Dependent Variable: Cost of shipment F-statistic To test H0: β4 = β5 = 0, from the ANOVA tables we have F ( SSER SSEC ) / 2 (6.633 2.745) / 2 9.92 MSEC 0.196 The critical value Fα (at 5% level) for and F-distribution with 2 and 14 d.f. is 3.74 Since F (9.92) > Fα (3.74) the null hypothesis is rejected at the 5% significance level. I.e. the model with quadratic terms is preferred over the reduced one. Computer output Variables Entered/Removedc Model 1 Variables Entered Weight* Distance, Distance squared, Weight squared, Weight of parcel in lbs ., Distancea shipped Variables Removed Method . 2 a . Distance squared, Weight b squared Enter F-statistic Remove F p-value a. All requested variables entered. b. All requested variables removed. Model Summary c. Dependent Variable: Cost of shipment Change Statistics Model 1 2 R .997a .993b R Square .994 .985 Adjusted R Square .992 .983 Std. Error of the Estimate .4428 .6439 R Square Change .994 -.009 F Change 458.388 9.917 df1 df2 5 2 14 14 Sig. F Change .000 .002 a. Predictors: (Constant), Weight*Distance, Distance squared, Weight s quared, Weight of parcel in lbs ., Distance s hipped b. Predictors: (Constant), Weight*Distance, Weight of parcel in lbs., Dis tance shipped Reject H0: β4 = β5 = 0 Executive salaries: a final model (?) • • • • • • Y x1 x2 x3 x4 x5 = Annual salary (in dollars) = Years of experience = Years of education = Gender : 1 if male; 0 if female = Number of employees supervised = Corporate assets (in millions of dollars) Try adding other variables to model 3 E(Y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x3 + β5x4 + β6x5 Model 6 Computer Output: Model 6 Riepilogo del modello Modello R 1 R-quadrato ,963a R-quadrato corretto ,927 ,922 Errore della stima 7020,089 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender Anovab Model 1 Somma dei quadrati Regressione Residuo Totale Media dei quadrati df 5,836E10 6 4,583E9 93 6,295E10 99 F Sig. 9,727E9 197,384 ,000a 4,928E7 a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender Computer Output: Model 6 Coefficients Model Coefficienti non standardizzati 1 B (Costante) Years of Experience Gender ExpGender Years of Education Number of Employees supervised Corporate assets (in million $) a. Variabile dipendente: Annual salary in $ Deviazion e standard Errore Coefficient i standardiz zati Beta -38331,331 9533,238 2178,964 171,979 13203,101 3137,775 669,546 2689,594 53,239 180,310 209,042 311,914 4,470 46,600 ,634 ,249 ,233 ,246 ,353 ,110 t Sig. -4,021 ,000 12,670 ,000 4,208 ,000 3,203 ,002 8,623 ,000 11,910 ,000 3,869 ,000 Executive salaries: comparison of models Mod. Predictors Adj. R2 1 x1, x2, x4, x5 Standard error 0.747 12685.31 2 x1, x3 0.735 12981.62 138.26 3 x1, x3, x1∙x3 0.746 12700.08 98.09 6 x1, x3, x1∙x3, x4, x5 0.922 7020.09 F-stat 74.05 197.38