Types of regression models
Regression Models
Simple
1° order
Multiple
1° order
2° order
Higher order
Interaction
2° order
Higher order
A quadratic second order model
E(Y)=β0+ β1x+ β2 x2
• Interpretation of model parameters:
• β0: y-intercept. The value of E(Y) when x1 = x2 = 0
• β1 : is the shift parameter;
• β2 : is the rate of curvature;
Example with quadratic terms
The true model, supposedly unknown, is
Yi 100.00
= 2 + xi2 + εi, with εi~N(0,2)





75.00
y

50.00
25.00
0.00


 


    


     
   
 

  



2.00
Data: (x,y). See SQM.sav
4.00










 

6.00
x




 










  


8.00
10.00
Model 1: E(Y) = β0 + β1x
Model Summary
Model
1
R
,973a
R Square
,947
a. Predictors: (Constant), x
Model
1
Regres sion
Residual
Total
Sum of
Squares
80624, 915
4500,202
85125, 117
a. Predic tors: (Constant), x
Adjusted
R Square
,947
ANOVAb
df
1
103
104
(Constant)
x
B
-19,959
10,744
a. Dependent Variable: y
Mean Square
80624, 915
43,691
F
1845,332
Sig.
,000a
Coefficientsa
b. Dependent Variable: y
Unstandardized
Coeffic ients
Model
1
Std. Error of
the Estimate
6,60994
Std. Error
1,483
,250
Standardiz ed
Coeffic ients
Beta
,973
t
-13,454
42,957
Sig.
,000
,000
Linear Regression

 
100.00
y = -19.96 + 10.74 * x
R-Square = 0.95



 
75.00
y

50.00
25.00
0.00





  



 
 


   














2.00
4.00









  

6.00
x




 










 


8.00
10.00
Model 2: E(Y) = β0 + β1x2
Model Summary
Model
1
R
,996a
R Square
,991
Adjusted
R Square
,991
a. Predictors: (Constant), XSquare
Model
1
Regres sion
Residual
Total
Std. Error of
the Estimate
2,68707
Smaller variance and SE
ANOVAb
Sum of
Squares
84381, 422
743,695
85125, 117
df
1
103
104
Mean Square
F
84381, 422 11686, 632
7,220
Sig.
,000a
a. Predic tors: (Constant), XSquare
b. Dependent Variable: y
Coeffi cientsa
Model
1
(Const ant)
XSquare
Unstandardized
Coeffic ients
B
St d. Error
2,340
,417
,997
,009
a. Dependent Variable: y
St andardiz ed
Coeffic ients
Beta
,996
t
5,608
108,105
Sig.
,000
,000
Linear Regression
100.00

y = 2.34 + 1.00 * XSquare
R-Square = 0.99

y

50.00
0.00
 
 





75.00
25.00



 

 
  


 





 


 





0.00








25.00
 










 



 
 

50.00
XSquare
75.00
100.00
Model 3: E(Y) = β0 + β1x + β2x2
Model Summary
Model
1
R
.996a
R Square
.991
Adjusted
R Square
.991
a. Predictors: (Constant), XSquare, x
Model
1
Regres sion
Residual
Total
Std. Error of
the Estimate
2.66608
ANOVAb
Sum of
Squares
84400. 103
725.014
85125. 117
df
2
102
104
Mean Square
42200. 052
7.108
F
5936.999
Sig.
.000a
a. Predic tors: (Constant), XSquare, x
b. Dependent Variable: y
Model
1
(Const ant)
x
XSquare
Coeffi cientsa
Unstandardized
Coeffic ients
B
St d. Error
4.177
1.206
-.830
.512
1.071
.046
a. Dependent Variable: y
St andardiz ed
Coeffic ients
Beta
-.075
1.069
t
3.463
-1. 621
23.046
Sig.
.001
.108
.000
Types of regression models
Regression Models
Simple
1° order
Multiple
1° order
2° order
Higher order
Interaction
2° order
Higher order
A third order model with 1 IV
E(Y)=β0+ β1x+ β2 x2+ β3 x3
Use with caution given
numerical problems that
could arise
Y
 >0
3
Y
X1
 <0
3
X1
Types of regression models
Regression Models
Simple
1° order
Multiple
1° order
2° order
Higher order
Interaction
2° order
Higher order
First-Order model in k Quantitative variables
E(Y)=β0+β1x1+β2 x2 + ... + βk xk
Interpretation of model parameters:
β0: y-intercept. The value of E(Y) when x1 = x2 =...= xk= 0
β1: change in E(Y) for a 1-unit increase in x1 when x2,.., xk
are held fixed;
β2: change in E(Y) for a 1-unit increase in x2 when x1, x3,...,
xk are held fixed;
...
A bivariate model
E(Y)=β0+β1x1+β2 x2
Changing x2 changes only the y-intercept.
In the first order model a 1-unit change in one independent
variable will have the same effect on the mean value of y
regardless of the other independent variables.
A bivariate model
Y
Response
Plane
X1
Yi =  0 +  1X1i +  2X2i +  i
(Observed Y)
0
i
X2
(X1i,X2i)
E(Y) =  0 +  1X1i +  2X2i
Example: executive salaries
•
•
•
•
•
•
Y = Annual salary (in dollars)
x1 = Years of experience
x2 = Years of education
x3 = Gender : 1 if male; 0 if female
x4 = Number of employees supervised
x5 = Corporate assets (in millions of dollars)
E(Y)=β0+ β1x1+ β2 x2 + β4 x4 + β5 x5
Data: ExecSal.sav
Do not consider x3
(Gender) for the moment
Exsecutive salaries: Computer Output
Riepilogo del modello
Modello
R-quadrato
R
R-quadrato corretto
,870a
,757
,747
Deviazione standard Errore
della stima
12685,309
a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education,
Number of Employees supervised
Simple regression
Multiple regression
Riepilogo del modello
Modello
R
R-quadrato
R-quadrato
corretto
Deviazione
standard Errore
della stima
1
dimension0
,783a
,613
.
Predittori: (Costante), Years of Experience
a
,609
15760,006
Coefficient of determination
The coefficient R2 is computed exactly as in the
simple regression case.
R2 
Explained variation SSR
SSE

 1
Total variation
SST
SST
n

n
( yi  y ) 2 
i 1
SST (Total)

n
( yˆ i y ) 2 
i 1
SSR (Regression)

( yi yˆ i ) 2
i 1
SSE (Error)
A drawback of R2: it increases with the number of added
variables, even if these are NOT relevant to the problem.
Adjusted R2 and estimate of the variance σ2
A solution: Adjusted R2
– Each additional variable reduces adjusted R2, unless
SSE varies enough to compensate
Ra2
 n  1  SSE
SSE
2
 1 

1


R

SST
 n  k  1  SST
An unbiased estimator of the variance σ 2 is computed as
2
 ˆi
SSE
s 

n  k  1 n  k  1
2
Exsecutive salaries: Computer Output (2)
Coefficientia
Model
Coefficienti non
standardizzati
Variables
1
B
(Costante)
Years of
Experience
Years of
Education
Number of
Employees
supervised
Corporate
assets (in
million $)
Deviazione
standard
Errore
Coefficienti
standardizz
ati
Beta
T-tests
t
Sig.
-37082,148
17052,089
-2,175
,032
2696,360
173,647
,785 15,528
,000
2656,017
563,476
,243
4,714
,000
41,092
7,807
,272
5,264
,000
244,569
83,420
,149
2,932
,004
Variabile dipendente: Annual salary in $
Testing overall significance: the F-test
• 1. Shows If There Is a Linear Relationship
Between All X Variables Together & Y
• 2. Uses F Test Statistic
• 3. Hypotheses
– H0: 1 = 2 = ... = k = 0
• No Linear Relationship
– Ha: At Least One Coefficient Is Not 0
• At Least One X Variable Affects Y
The F-test for 1 single coefficient is equivalent to the t-test
Anova table
F-statistic
Anovab
Modello
1
Somma dei
quadrati
Media dei
quadrati
df
Regressione
4,766E10
Residuo
1,529E10
95
Totale
6,295E10
99
F
4 1,192E10 74,045
Sig.
,000a
1,609E8
. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of
Education, Number of Employees supervised
a
df = k: number of
b. Variabile dipendente: Annual salary in $
regression slopes
p-vale of F-test
df = n-1: n=
number of
observations
MSE (mean
square error),
the estimate of
variance
Decision: reject
H0, i.e. accept
this model
Interaction (second order) model
E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2
• Interpretation of model parameters:
• β0: y-intercept. The value of E(Y) when x1 = x2 = 0
• β1+ β3 x2 : change in E(Y) for a 1-unit increase in x1
when x2 is held fixed;
• β2 + β3 x1 : change in E(Y) for a 1-unit increase in x2
when x1 is held fixed;
• β3: controls the rate of change of the surface.
Interaction (second order) model
E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2
Contour lines are not parallel
The effect of one variable depends on the level of the other
Example: Antique grandfather clocks auction
Clocks are sold at an auction on competitive offers.
Data are:
– Y : auction price in dollars
– X1: age of clocks
– X2: number of bidders
Model 1: E(Y) = β0 + β1x1 + β2x2
Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2
Data: GFCLOCKS.sav
Data summaries
Descriptive Statistics
Minimu Maximu Mean
Std.
N
Skewness
Kurtosis
m
m
Deviatio Statistic Std. Error Statistic Std. Error
Statistic Statistic
Statistic
Statistic Statistic
n
Age
32
108
194 144.94 27.395
.216
.414 -1.323
.809
Bidders
32
5
15
9.53
2.840
.420
.414
-.788
.809
Price
32
729
2131 1326.88 393.487
.396
.414
-.727
.809
Valid N (listwise)
32
If data are Normal Skewness is 0
If data are Normal (eccess) Kurtosis is 0
Note: Skewness and Kurtosis are not
enough to establish Normality
P-P plot for Normality
If data are Normal.
Points should be
along the straight
line.
In this example the
situation is fairly
good
Bivariate scatter-plots





2000

2000









Price

1200





1600


1200






















800











Price
1600


800


120
140
160
Age
180
6



8
10
Bidders
12
14
Model 1: E(Y) = β0 + β1x1 + β2x2
Model Summary
Model
1
R
R Square
a
.945
.892
Adjusted
R Square
.885
Std. Error of
the Estimate
133.485
a. Predictors: (Constant), Bidders, Age
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
4283062.960
516726.540
4799789.500
df
2
29
31
Mean Square
2141531.480
17818. 157
a. Predic tors : (Constant), Bidders , Age
Coeffi cientsa
b. Dependent Variable: Price
Unstandardized
St andardiz ed
Coeffic ients
Coeffic ients
Model
B
St d. Error
Beta
1
(Const ant) -1338.951
173.809
Age
12.741
.905
.887
Bidders
85.953
8.729
.620
a. Dependent Variable: Price
F
120.188
t
-7. 704
14.082
9.847
Sig.
.000a
Sig.
.000
.000
.000
Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2
Model Summary
Model
1
R
R Square
a
.977
.954
Adjusted
R Square
.949
Std. Error of
the Estimate
88.915
a. Predictors: (Constant), AgeBid, Age, Bidders
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
4578427.367
221362.133
4799789.500
df
3
28
31
Mean Square
1526142.456
7905.790
F
193.041
Sig.
.000a
t
1.086
.432
-3.120
6.112
Sig.
.287
.669
.004
.000
a. Predic tors : (Constant), AgeBid, Age, Bidders
a
b. Dependent Variable: Price Coefficients
Model
1
(Constant)
Age
Bidders
AgeBid
Unstandardized
Coefficients
B
Std. Error
320.458
295.141
.878
2.032
-93.265
29.892
1.298
.212
a. Dependent Variable: Price
Standardized
Coefficients
Beta
.061
-.673
1.369
Interpreting interaction models
The coefficient for the interaction term is significant.
If an interaction term is present then also the
corresponding first order terms need to be included to
correctly interpret the model.
In the example an uncareful analyst could estimate the
effect of Bidders as negative, since b2=-93.26
Since an interaction term is present, the slope estimate
for Bidders (x2) is
b2 + b3x1
Note: b = ^β
For x1= 150 (age) the estimated slope for Bidders is
-93.26 + 1.3 (150) = 101.74
Models with qualitative X’s
Regression models can also include qualitative (or
categorical) independent variables (QIV).
The categories of a QIV are called levels
Since the levels of a QIV are not measured on a natural
numerical scale in order to avoid introducing fictitious
linear relations in the model we need to use a specific
type of coding.
Coding is done by using IV which assume only two values:
0 or 1.
These coded IV are called dummy variables
Models with QIV
• Suppose we want to model Income (Y) as a function of
Sex (x) -> use coded, or dummy, variables
x = 1 if Male, x = 0 if Female
E(Y) = β0+ β1x
E(Y) = β0+ β1 if x =1, i.e. Male
E(Y) = β0 if x =0, i.e. Female
β0 is the base level, i.e Female is the reference category
β1 is the additional effect if Male
In this simple model, only the means for the two groups
are modeled
QIV with q levels
As a general rule, if a QIV has q levels we need q-1 dummies
for coding. The uncoded level is the reference one.
Example: a QIV has three levels, A, B and C
Define
x1 = 1 level A, x1 = 0 if not
x2 = 1 level B, x2 = 0 if not
Model: E(Y) = β0+ β1x1 + β2x2 C is the reference level
Interpreting β’s
β0 = μC
(mean for base level C)
β1 = μA - μC
(additional effect wrt C if level A)
β2 = μB - μC
(additional effect wrt C if level B)
Models with dummies
Even if models which consider only dummy variables do in
practice estimate the means of various groups, the
testing machinery of the regression setup can be useful
for group comparisons.
Dummies can be used in combination with any other
dummies and quantitative X’s to construct models with
first order effects (or main effects) and interactions to
test hypotheses of interest.
In order to define dummies in SPSS see
“Computing dummy vars in SPSS.ppt”
Example: executive salaries
A managing consulting firms has developed a regression
model in order to analyze executive’s salary structure
•
•
•
•
•
•
Y
x1
x2
x3
x4
x5
= Annual salary (in dollars)
= Years of experience
= Years of education
= Gender : 1 if male; 0 if female
= Number of employees supervised
= Corporate assets (in millions of dollars)
Data: ExecSal.sav
A simple model: E(Y) = β0 + β3x3
Male group
Female group
This model estimates the means of the two groups (M,F)
We wanto to test if the difference in means is
significant, i.e. not due to chance
Regression Output
Model Summary
Model
1
R
R Square
a
.392
.153
Adjusted
R Square
.145
a. Predictors: (Constant), Gender
Model
1
Regres sion
Residual
Total
ANOVAb
Sum of Squares
9651865066.845
53295882433.156
62947747500.001
df
1
98
99
a. Predictors: (Constant), Gender
b. Dependent Variable: Annual s alary in $
Model
1
(Const ant)
Gender
Salary difference between
groups is significant
Std. Error of
the Estimate
23320.282
Unstandardized
Coeffic ients
B
St d. Error
83847. 059 3999.395
20739. 305 4922.915
Mean Square
9651865066.845
543835535.032
F
17.748
Sig.
.000a
Coeffi cientsa
St andardiz ed
Coeffic ients
Beta
.392
t
20.965
4.213
Sig.
.000
.000
95% Confidenc e Int erval for B
Lower Bound Upper Bound
75910. 389
91783. 729
10969. 940
30508. 670
a. Dependent Variable: Annual salary in $
Mean increment for Male
C.I. for mean increment
Model 2: E(Y) = β0 + β1x1 + β3x3
It seems that
the two groups
are separated
Model 2 considers
same slope but
different
intercepts
If x3 = 0 (female) then E(Y) = β0 + β1x1
If x3 = 1 (male)
then E(Y) = β0 + β3 + β1x1
Computer output for model 2
R square improved greatly
Model Summary
Model
1
R
R Square
a
.860
.740
Adjusted
R Square
.735
Std. Error of
the Estimate
12981.615
a. Predictors: (Constant), Years of Experience, Gender
b
ANOVA
Model
1
Regres sion
Residual
Total
Sum of Squares
46601081714.527
16346665785.474
62947747500.001
df
2
97
99
Mean Square
23300540857.264
168522327.685
F
138.264
Sig.
.000a
a. Predictors: (Constant), Years of Experience, Gender
a
b. Dependent Variable: Annual salary in $
Model
1
Coeffi cients
Unstandardized
Coeffic ients
B
St d. Error
(Const ant)
50614. 312 3161.279
Gender
18894. 215 2743.253
Years of Experienc e 2633.831
177.875
St andardiz ed
Coeffic ients
Beta
.357
.767
t
16.011
6.888
14.807
Sig.
.000
.000
.000
95% Confidenc e Int erval for B
Lower Bound Upper Bound
44340. 048
56888. 576
13449. 618
24338. 812
2280.799
2986.863
a. Dependent Variable: Annual salary in $
New intercept for
Male is significant
In this model effect of experience
is assumed equal for the two
groups
Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3
With this model we want to test whether gender and
experience interacts, i.e. if male salary tend to grow at
a faster (slower) rate with experience.
If x3 = 0 (female) then E(Y) = β0 + β1x1
If x3 = 1 (male)
then E(Y) = (β0 + β3) + (β1 + β4)x1
New intercept for
male
New slope for male
Remark: running regression for the two groups together
allows to have higher degrees of freedom (n) for
estimating parameters and model variance.
Model 3: E(Y) = β0 + β1x1 + β3x3 + β4x1x3
Model 3 considers
different slope
and different
intercepts
Computer output for model 3
Model Summary
Model
1
R
R Square
.868a
.754
Adjust ed
R Square
.746
St d. Error of
the Es timate
12700. 080
There is evidence that
salaries for the two groups
grow at different rate with
experience
a. Predic tors : (Constant), ExpGender, Years
of a
Coefficients
Ex perience, Gender
Model
1
(Constant)
Gender
Years of Experience
ExpGender
Unstandardized
Coefficients
B
Std. Error
58049.768 4461.179
7798.504 5497.470
2044.541
308.565
864.122
373.653
Standardized
Coefficients
Beta
.147
.595
.301
t
13.012
1.419
6.626
2.313
Sig.
.000
.159
.000
.023
95% Confidence Interval for B
Lower Bound Upper Bound
49194.397
66905.139
-3113.888
18710.896
1432.045
2657.036
122.426
1605.818
a. Dependent Variable: Annual s alary in $
Estimated lines:
^ = 58049.8 + 2044.5*(Years of Experience) for female
Y
^ = 65848.3 + 2908.7*(Years of Experience) for male
Y
A complete second order model
E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+ β5 x22
• Interpretation of model parameters:
•
•
•
•
β0: y-intercept. The value of E(Y) when x1 = x2 = 0
β1 and β2 : shifts along the x1 and x2 axes;
β3 : rotation of the surface;
β4 and β5 : controls the rate of curvature.
Back to Executive salaries
What about if
suspect that rate
of growth
changes and has
opposite signs for
M and F?
x1 = Years of experience
x3 = Gender (1 if Male)
Note: x32 = x3 since
it is a dummy
E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12
E(Y)=β0+ β1x1+ β2 x3 + β3 x1x3+ β4x12+ β5 x3x12
Model 4
Model 5
Comparing Model 4 and 5
Model 4
If x3 = 0 (female) then
E(Y) = β0 + β1x1 + β4x12
If x3 = 1 (male) then
E(Y) = (β0 + β2) + (β1 + β3)x1 + β4x12
Model 5
Different intercept and slope for M
and F but same curvature
If x3 = 0 (female) then
E(Y) = β0 + β1x1 + β4x12
If x3 = 1 (male) then
E(Y) = (β0 + β2) + (β1 + β3)x1 + (β4+β5)x12
Different intercept, slope and
curvature for M and F
Model 5: computer output
Riepilogo del modello
Modello
R
dimension0
,875a
1
R-quadrato
corretto
R-quadrato
,766
Deviazione
standard Errore
della stima
,754
12507,735
a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen
Anovab
Modello
1
Somma
dei
quadrati
Media dei
quadrati
df
Regressione
4,824E10
5
Residuo
1,471E10
94
Totale
6,295E10
99
a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen
b. Variabile dipendente: Annual salary in $
F
9,648E9 61,673
1,564E8
Sig.
,000a
Model 5: computer output
Coefficientia
Modello
1
Coefficienti non
standardizzati
B
Deviazion
e
standard
Errore
Beta
t
Sig.
(Costante)
52391,973 6497,971
8,063
,000
Years of
Experience
Gender
ExpGen
ExpSqu
Exp2Gen
3373,970 1165,248
,982 2,895
,005
21122,152 8285,802
-2081,897 1459,842
-53,181
45,001
112,836
54,950
,399
-,724
-,422
,904
2,549
-1,426
-1,182
2,053
a. Variabile dipendente: Annual salary in $
Which model is preferable? Model 3 or model 5?
,012
,157
,240
,043
A test for comparing nested models
Two models are nested if one model contains all the terms
of the other model and at least one additional term.
The more complex of the two models is called the
complete (or full) model.
The other is called the reduced (or restricted) model.
Example: model 1 is nested in model 2
Model 1: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2
Model 2: E(Y)=β0+ β1x1+ β2 x2 + β3 x1x2+ β4x12+ β5 x22
To compare the two models we are interested in testing
H0: β4 = β5 = 0, vs. H1: at least one, β4 or β5, differs from 0
F-test for comparing nested models
Reduced model:
E(Y) = β0+ β1x1+ … + β2 xg
Complete Model:
E(Y) = β0+ β1x1+ … + β2 xg + βg+1 xg+1 + … + βkxk
To test
H0: βg+1 = … = βk = 0
H1: at least one of the parameters being tested is not 0
Compute
( SSER  SSEC ) /( k  g )
F
MSEC
Reject H0 when F > Fα, where Fα is the level α critical
point of an F distribution with (k-g, n-(k+1)) d.f.
F-test for nested models
Where:
SSER = Sum of squared errors for the reduced model;
SSEC = Sum of squared errors for the complete model;
MSEC = Mean square error for the complete model;
Remark:
k – g = number of parameters tested
k +1 = number of parameters in the complete model
n = total sample size
Compute partial F-tests with SPSS
1. Enter your complete model in the Regression dialog box
– choose the Method “Enter”
2. Click on “Next”
3. In the new box for Independent variables, enter those
you want to remove (i.e. those you’d like to test)
– choose the Method “Remove”
4. In the “Statistics” option select “R squared change”
5. Ok.
Applying the F-test
Let us use the F-test to compare Model 3 and Model 5 in
the executive salaries example.
Note that Model 3 is nested in Model 5
Model 3:
E(Y) = β0 + β1x1 + β2x3 + β3x1x3
Model 5:
E(Y) = β0 + β1x1 + β2x3 + β3x1x3 + β4x12 + β5x3x12
Apply the F-test for H0: β4 = β5 = 0
Computer output
Variabili inserite/rimossec
Modello
Variabili
Variabili
inserite
rimosse Metodo
1
.
Exp2Gen,
Per
blocchi
Gender, Years
of Experience,
ExpSqu,
ExpGena
2
.a
Exp2Gen, Rimuovi
ExpSqub
a. Tutte le variabili richieste sono state immesse.
Do NOT reject H0: β4 = β5 = 0,
i.e. Model 3 is better
F-statistic
F p-value
b. Tutte le variabili richieste sono state rimosse.
c. Variabile dipendente: Annual salary in $
Riepilogo del modello
Variazione dell'adattamento
Model
R
RDeviazione
R- quadrat standard Variazione
quadr
di RVariazio
o
Errore della
stima
ato corretto
quadrato ne di F df1
1
,875°
,766
,754 12507,735
2
,868b
,754
,746 12700,080
a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen
b. Predittori: (Costante), Gender, Years of Experience, ExpGen
,766 61,673
-,012
2,488
df2
Sig.
Variazio
ne di F
5
94
,000
2
94
,089
A quadratic model example: Shipping costs
Although a regional delivery service bases the charge for shipping a
package on the package weight and distance shipped, its profit per
package depends on the package size (volume of space it occupies) and
the size and nature of the delivery truck.
The company conducted a study to investigate the relationship
between the cost of shipment and the variables that control the
shipping charge: weight and distance.
– Y : cost of shipment in dollars
– X1: package weight in pounds
– X2: distance shipped in miles
It is suspected that non linear effect may be present
Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 + β5x22
Data: Express.sav
Scatter plots
16.0
16.0






12.0

12.0
Cost of shipment
Cost of shipment




8.0







0.00

8.0








2.00


4.0





4.0


4.00
6.00
Weight of parcel in lbs.
8.00
50

 

100
150
200
250
Distance shipped
Scatter plots in multiple regression often do not show too much information
Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 + β5x22
Model Summ ary
Model
1
R
R Square
.997a
.994
Adjust ed
R Square
.992
St d. Error of
the Es timate
.4428
a. Predic tors: (Constant), Weight*Dist anc e, Distance
b
ANOVA
squared, W eight squared, W eight of parcel
in lbs.,
Distance s hipped
Model
1
Regres sion
Residual
Total
Sum of
Squares
449.341
2.745
452.086
df
5
14
19
Mean Square
89.868
.196
F
458.388
Sig.
.000a
a. Predictors: (Constant), Weight*Dis tance, Distance s quared, Weight squared,
Coeffi cientsa
Weight of parcel in lbs., Distance s hipped
b. Dependent Variable: Cos t of s hipment
Unstandardized
Coeffic ient s
Model
B
St d. Error
1
(Const ant)
.827
.702
W eight of parc el in lbs.
-.609
.180
Distance s hipped
.004
.008
W eight squared
.090
.020
Distance s quared
1.51E-005
.000
W eight*Dis tanc e
.007
.001
a. Dependent Variable: Cost of shipment
St andardiz ed
Coeffic ient s
Beta
-.316
.062
.382
.075
.850
t
1.178
-3. 386
.503
4.442
.672
11.495
Sig.
.259
.004
.623
.001
.513
.000
Not significant, try to eliminate
Distance squared
Model: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12
Model Summary
Model
1
R
.997a
R Square
.994
Adjusted
R Square
.992
Std. Error of
the Estimate
.4346
a. Predictors: (Constant), Weight*Distance, Distance
b
ANOVA
shipped, Weight squared, Weight of parcel
in lbs.
Model
1
Regres sion
Residual
Total
Sum of
Squares
449.252
2.833
452.086
df
4
15
19
Mean Square
112.313
.189
F
594.623
Sig.
.000a
a. Predictors: (Constant), Weight*Dis tance, Distance s hipped, Weight squared,
Coefficientsa
Weight of parcel in lbs.
b. Dependent Variable: Cos t of s hipment
Unstandardized
Model
1
(Constant)
Weight of parcel in lbs.
Distance shipped
Weight squared
Weight*Dis tance
Coefficients
B
Std. Error
.475
.458
-.578
.171
.009
.003
.087
.019
.007
.001
a. Dependent Variable: Cos t of shipment
Standardized
Coefficients
Beta
-.300
.141
.369
.842
t
1.035
-3.387
3.421
4.485
11.753
Sig.
.317
.004
.004
.000
.000
Applying the F-test: Shipping costs
A company conducted a study to investigate the relationship
between the cost of shipment and the variables that control the
shipping charge: weight and distance.
– Y : cost of shipment in dollars
– X1: package weight in pounds
– X2: distance shipped in miles
It is suspected that non linear effect may be present,
use the F-test for nested models to decide between
Model 1: E(Y) = β0 + β1x1 + β2x2 + β3x1x2 + β4x12 + β5x22
Model 2: E(Y) = β0 + β1x1 + β2x2 + β3x1x2
Data: Express.sav
ANOVA Tables
Full model
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
449.341
2.745
452.086
df
5
14
19
Mean Square
89.868
.196
F
458.388
Sig.
.000a
a. Predictors: (Constant), Weight*Dis tance, Distance s quared, Weight squared,
Weight of parcel in lbs., Distance s hipped
b. Dependent Variable: Cos t of s hipment
Reduced model
ANOVAb
Model
1
Regres sion
Residual
Total
Sum of
Squares
445.452
6.633
452.086
df
3
16
19
Mean Square
148.484
.415
F
358.154
Sig.
.000a
a. Predic tors: (Constant), Dist ance shipped, W eight of parcel in lbs., Weight *Dist anc e
b. Dependent Variable: Cost of shipment
F-statistic
To test H0: β4 = β5 = 0, from the ANOVA tables we have
F
( SSER  SSEC ) / 2 (6.633  2.745) / 2

 9.92
MSEC
0.196
The critical value Fα (at 5% level) for and F-distribution
with 2 and 14 d.f. is 3.74
Since F (9.92) > Fα (3.74) the null hypothesis is rejected at
the 5% significance level. I.e. the model with quadratic
terms is preferred over the reduced one.
Computer output
Variables Entered/Removedc
Model
1
Variables
Entered
Weight*
Distance,
Distance
squared,
Weight
squared,
Weight of
parcel in
lbs .,
Distancea
shipped
Variables
Removed
Method
.
2
a
.
Distance
squared,
Weight b
squared
Enter
F-statistic
Remove
F p-value
a. All requested variables entered.
b. All requested variables removed.
Model Summary
c. Dependent Variable: Cost of shipment
Change Statistics
Model
1
2
R
.997a
.993b
R Square
.994
.985
Adjusted
R Square
.992
.983
Std. Error of
the Estimate
.4428
.6439
R Square
Change
.994
-.009
F Change
458.388
9.917
df1
df2
5
2
14
14
Sig. F Change
.000
.002
a. Predictors: (Constant), Weight*Distance, Distance squared, Weight s quared, Weight of parcel in lbs ., Distance s hipped
b. Predictors: (Constant), Weight*Distance, Weight of parcel in lbs., Dis tance shipped
Reject H0: β4 = β5 = 0
Executive salaries: a final model (?)
•
•
•
•
•
•
Y
x1
x2
x3
x4
x5
= Annual salary (in dollars)
= Years of experience
= Years of education
= Gender : 1 if male; 0 if female
= Number of employees supervised
= Corporate assets (in millions of dollars)
Try adding other variables to model 3
E(Y) = β0 + β1x1 + β2x2 + β3x3 + β4x1x3 + β5x4 + β6x5
Model 6
Computer Output: Model 6
Riepilogo del modello
Modello
R
1
R-quadrato
,963a
R-quadrato
corretto
,927
,922
Errore della
stima
7020,089
a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of
Employees supervised, ExpGender
Anovab
Model
1
Somma dei
quadrati
Regressione
Residuo
Totale
Media dei
quadrati
df
5,836E10
6
4,583E9
93
6,295E10
99
F
Sig.
9,727E9 197,384 ,000a
4,928E7
a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender
Computer Output: Model 6
Coefficients
Model
Coefficienti non
standardizzati
1
B
(Costante)
Years of Experience
Gender
ExpGender
Years of Education
Number of Employees
supervised
Corporate assets (in million
$)
a. Variabile dipendente: Annual salary in $
Deviazion
e standard
Errore
Coefficient
i
standardiz
zati
Beta
-38331,331 9533,238
2178,964
171,979
13203,101 3137,775
669,546
2689,594
53,239
180,310
209,042
311,914
4,470
46,600
,634
,249
,233
,246
,353
,110
t
Sig.
-4,021
,000
12,670
,000
4,208
,000
3,203
,002
8,623
,000
11,910
,000
3,869
,000
Executive salaries: comparison of models
Mod.
Predictors
Adj. R2
1
x1, x2, x4, x5
Standard
error
0.747 12685.31
2
x1, x3
0.735 12981.62
138.26
3
x1, x3, x1∙x3
0.746 12700.08
98.09
6
x1, x3, x1∙x3,
x4, x5
0.922
7020.09
F-stat
74.05
197.38
Scarica

First-Order model in k Quantitative variables