Metodi Quantitativi per Economia, Finanza
e Management
Lezione n°8
L’utilizzo dell’analisi fattoriale nella costruzione di un modello di regressione lineare
multipla
Analisi fattoriale
Quante componenti considerare?
1.
metodo degli autovalori >1
2.
3.
rapporto tra numero di componenti e variabili (circa 1/3)
percentuale di varianza spiegata (almeno 60%)
4.
lo SCREE PLOT (plot di autovalore vs il numero di fattori)
Se il plot mostra un “gomito” è plausibile ipotizzare l’esistenza di una struttura
latente, se la forma è quasi rettilinea significa che i fattori sono solo una
trasformazione delle variabili manifeste. I fattori rilevanti sono quelli al di sopra del
gomito (a discrezione anche quello in corrispondenza del gomito). Se non ci sono
fattori predominanti il criterio è inadatto.
Analisi fattoriale
Quante componenti considerare?
5.
Comunalità:
- confronto tra le comunatità di più soluzioni
- la quota di varianza spiegata di ciascuna variabile dalla
soluzione scelta deve essere soddisfacente
Analisi fattoriale
Come interpretarle?
1.
rotazione delle componenti
La rotazione ortogonale nello spazio dei fattori non influenza la
validità del modello: sfruttiamo questa caratteristica per ottenere dei
fattori più facilmente interpretabili.
–
–
–
2.
The Varimax method of rotation, suggested by Kaiser, has the purpose
of minimizing the number of variables with high saturations
(correlations) for each factor
The Quartimax method attempts to minimize the number of factors
tightly correlated to each variable
The Equimax method is a cross between the Varimax and the
Quartimax
correlazioni tra componenti principali e variabili originarie
Esempi
di Analisi Fattoriale
di vecchi lavori di gruppo
Esempio
Importanza dell’Informazione e
modalità di acquisizione
Obiettivo della ricerca è comprendere quali siano i principali mezzi
informativi, il relativo indice di gradimento e quali siano gli argomenti di
maggior interesse.
Analisi fattoriale:
Le variabili considerate sono i 14 parametri che influenzano la scelta del
canale e quelli che influenzano la scelta relativa al tipo di fonte
In funzione di cosa scegli il canale? Su una scala da 1 a 10 (dove 1=
per niente e 10= moltissimo) esprimi un giudizio sull’importanza:
semplicità
costo
velocità di acquisizione
comodità
tempo di aggiornamento
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4
5
5
5
5
5
6
6
6
6
6
7
7
7
7
7
8
8
8
8
8
9
9
9
9
9
10
10
10
10
10
In funzione di cosa scegli le fonti? Su una scala da 1 a 10 (dove 1=
per niente e 10= moltissimo) esprimi un giudizio sull’importanza:
orientamento politico
temi
trattati
area geografica di
interesse
direttore
formato / stile
con chi
vivi
redazione
giornalisti/speaker
qualità servizi
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
1
2
3
4
5
6
7
8
9
10
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
7
7
7
8
8
8
9
9
9
10
10
10
Esempio
Importanza dell’Informazione e
modalità di acquisizione
Le variabili considerate sono i parametri che influenzano la scelta del
canale e quelli che influenzano la scelta relativa al tipo di fonte
Esempio: Importanza dell’Informazione
e modalità di acquisizione
Autovalori della matrice di correlazione: Totale
= 14 Media = 1
Autovalore
Differenza
Proporzione
Cumulata
1
3.16944223
0.52227941
0.2264
0.2264
2
2.64716281
1.35701039
0.1891
0.4155
3
1.29015243
0.02599489
0.0922
0.5076
4
1.26415754
0.2604549
0.0903
0.5979
5
1.00370264
0.20036187
0.0717
0.6696
6
0.80334077
0.01216326
0.0574
0.727
7
0.79117751
0.13231428
0.0565
0.7835
8
0.65886322
0.03460029
0.0471
0.8306
9
0.62426293
0.12396136
0.0446
0.8752
10
0.50030158
0.09138333
0.0357
0.9109
11
0.40891825
0.04258591
0.0292
0.9401
12
0.36633234
0.09276211
0.0262
0.9663
13
0.27357023
0.07495472
0.0195
0.9858
14
0.19861552
0
0.0142
1
9
Esempio: Importanza dell’Informazione
e modalità di acquisizione
SCREE PLOT
3.5
3
AUTOVALORI
2.5
2
1.5
1
0.5
0
1
2
3
4
5
6
7
8
FATTORI
9
10
11
12
13
14
Esempio: Importanza dell’Informazione
e modalità di acquisizione
VARIABILI
DESCRIZIONE
D_17_s
semplicità
D_17_c
costo
D_17_v
velocità di acquisizione
D_17_com
comodità
D_17_tda
tempo di aggiornamento
D_20_orp
orientamento politico
D_20_tt
temi trattati
D_20_ag
area geografica di interesse
D_20_d
direttore
D_20_fs
formato/stile
D_20_ccv
con chi vivi
D_20_r
redazione
D_20_gs
giornalisti/speaker
D_20_qs
qualità servizi
totale
%
CUMUNALITA'
n=5
n=6
0.70
0.73
0.82
0.66
0.73
0.80
0.65
0.68
0.72
0.56
0.54
0.70
0.63
0.46
0.70
0.73
0.84
0.71
0.81
0.81
0.70
0.84
0.73
0.63
0.56
0.73
0.67
0.74
9.37
10.18
66.96%
72.70%
CONFRONTO
CUMUNALITA’ FINALI:
Esempio: Importanza dell’Informazione
e modalità di acquisizione
Schema fattoriale
Factor1
Factor2
Factor3
Factor4
Factor5
D_17_s
semplicità
0.56626
.
.
.
0.46051
D_17_c
costo
0.35685
.
0.65469
.
0.3875
D_17_v
velocità
0.75292
.
.
.
.
D_17_com
comodità
0.68764
-0.36206
.
.
.
D_17_tda
tempo di aggiornamento
0.5326
-0.43612
.
.
-0.38524
D_20_orp
orientamento politico
.
0.54298
.
0.53024
.
0.41299
.
.
0.53419
.
D_20_tt
temi trattati
D_20_ag
area geografica di interesse
.
.
-0.5248
.
0.38026
D_20_d
direttore
.
0.74874
.
.
.
D_20_fs
formato/stile
0.38261
.
.
-0.43544
.
D_20_ccv
con chi vivi
.
0.50515
.
.
.
D_20_r
redazione
.
0.72899
.
.
.
D_20_gs
giornalisti/speaker
0.58604
0.49902
.
.
.
D_20_qs
qualità servizi
0.63683
.
.
.
.
I valori minori di 0.35 non sono stampati.
Lo schema fattoriale a 5 fattori, così come si presenta, è di difficile interpretazione; per
questo risulta opportuno ruotare i fattori attraverso un apposito metodo (VARIMAX).
Esempio: Importanza dell’Informazione
e modalità di acquisizione
SCHEMA FATTORIALE
RUOTATO
D_17_v
velocità
Factor1
Factor2
Factor3
Factor4
Factor5
0.8578
.
.
.
.
0.7885
.
.
.
.
D_17_tda
tempo di
aggiornamento
D_17_com
comodità
0.70345
.
.
0.39398
.
qualità servizio
0.53133
.
.
.
.
D_20_qs
D_20_r
rapidità
.
0.74824
.
.
.
D_20_fs
formato/stile
.
0.71171
.
.
.
D_20_ccv
con chi vivi
.
0.70059
.
.
.
D_20_gs
giornalisti/speaker
.
0.62098
0.36737
.
.
D_20_orp
orientamento politico
.
.
0.8923
.
.
D_20_d
direttore
.
.
0.77647
.
.
D_17_c
costo
.
.
.
0.83334
.
D_17_s
semplicità
.
.
.
0.65037
0.45187
D_20_ag
area geografica di
interesse
.
.
.
.
0.7622
D_20_tt
temi trattati
.
.
.
.
0.71198
I valori minori di 0.35 non sono stampati.
Rapidità di
acquisizione e
qualità del
servizio offerto.
Esposizione dell’
informazione
Affinità politica/
ideologica
Accessibilità al
sevizio
Attrattività
argomenti
trattati
Coffee Consumption in Italy
Factor Analysis
We ran a Factor Analysis on two numerical questions from
the survey that we felt might have correlated variables: Q15 (“What
are you general coffee preferences?”) and Q16 (“If you drink your
coffee outside (in a bar/coffee place) which are the main factors
that, in general, influence your decision on where you drink your
coffee?”).
• We used the Principal Components Method that was
supposed to solve the multicollinearity problem among our
variables and provide us with summarized number of
variables/factors which are not correlated (standardized by
definition, with mean 0, standard deviation 1) to better explain
and understand the specific situation of coffee consumption.
• This represents a preliminary phase for cluster analysis and
regression analysis.
Initial Variables used for analysis
On the right, there are our
initial 21 variables (taken
from Q15 and Q16) that we
selected for running the
factor analysis.
Judging by the SPSS
Correlation Matrix (that is
not present in the slide
because of its big size –
please see the output for the
check), we have many
variables which are
significantly correlated.
Need for FACTOR
REDUCTION! Start real
Factor Analysis!
Choosing the right number of factors
1.
2.
3.
4.
1/3 criteria: 21/3= 7 factors
Variance explained (60%-75%): 7, 8,
9, 10 factors
Scree Plot: 6, 8, 10 factors
Eigenvalues: 6, 7, 8 factors
The optimal
values seem
to be 7 or 8
factors.
Choosing the right number of factors –
continued -
The present
Scree Plot
represents the
number 3
criteria of
number of
factors selection
from the
previous slide.
Factor Analysis with 8 Factors
After analyzing the Communalities
table, we identified one variable that
is not properly explained by our 8
selected factors (0.387 is not
satisfying)! This variable is Price
which we consider an important
variable in our analysis!
Decreasing the number of factors to 7,
will not improve the explanatory power
of the variables for the price!
We decided to exclude the Price
variable from this factor analysis and
consider it as a separate factor
(given its very high importance from
our qualitative point view) in the
future analysis: cluster & regression
analysis.
Factor Analysis with 20 Factors
After elimination of the Price variable
1.
2.
3.
4.
1/3 criteria: 20/3= 6 factors
Variance explained (60%-75%):
factors
Scree Plot: 6, 7, 9 factors
Eigenvalues:
6, 7, 8 factors
7, 8, 9
The optimal
choice seems to
be 7 factors.
Factor Analysis with 20 Factors
After elimination of the Price variable
-continued-
The present
Scree Plot
represents the
number 3
criteria of
number of
factors selection
from the
previous slide.
Factor Analysis with 7 Factors
After
analyzing
the
Communalities table, we that
so far the 7 factors properly
explain the initial variables. All
communalities are over 0.400,
which is a good result.
We are ready to take a look at the
Rotated Component Matrix to see
if the factors make sense/can be
explained!
Factors - explained
•
•
•
1.
2.
3.
4.
5.
6.
7.
The method used for
rotation was Varimax.
After closely analyzing
the Rotated Component
Matrix, we tried to give
meaning to our 7
factors.
The names of the
respective factors are
the following:
Socialization factor
Internet/ Trendiness
factor
Close meeting place
factor
Intellectual/ nonsmoking factor
Familiarity factor
Variety/To Go factor
Traditionality &
Addiction factor
Factors – explained
- continued 1. Socialization Factor
Socialize, sit down, being with
friends, cozy atmosphere
2. Internet/Trendiness Factor
Wi-Fi availability, internet, trendy
place
3. Close meeting place Factor
Close to home/work/school,
ability to meet people, quality of
coffee not important
4. Intellectual/Non-smoking Factor
Non-smokers, usually snack,
love to read
5. Familiarity Factor
Go to the same bar, do not like
trying new places, concerned
about quality of coffee
6. Variety/To-go Factor
Variety and coffee to go, non
traditional Italian coffee,
preference for taking coffee
alone
7. Traditionality/Addiction Factor
Italian coffee preference, addicts
The consumption of Digital Music
and its impact on the Music
Industry
Factor Analysis
 We have taken into
consideration questions n° 4,9,10
and therefore we have 24
variables
We asked interviewees to give a
score from 1 to 9 (1: “I don’t like
it” 9: “I love it”)
or to use percentages
Quest.n.4: score
Quest.n.9: score
Quest.n.10: %
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
Home
Car
Outside in general
Office/University
Shops
Restaurants
Bars/discoteque
Record player
Cassette player
CD player
Digital player
Car stereo
House stereo
Radio
Mobile phone
USE record player
USE cassette player
USE CD Player
USE digital player
USE car stereo
USE house stereo
USE radio
USE PC
USE mobile phone
Factor Analysis
Number of factors: 9
First
hypothesis:
Extraction: Principal Component Analysis
Max number of interaction: 25
Rotation : Varimax
Total Vari ance Ex pla ined
Component
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Total
3,389
2,768
1,970
1,542
1,539
1,388
1,355
1,164
1,058
,956
,919
,823
,714
,689
,600
,565
,504
,464
,455
,355
,321
,253
,211
-7, 9E-017
Initial Eigenvalues
% of Variance Cumulative %
14,120
14,120
11,533
25,653
8,209
33,862
6,425
40,287
6,411
46,698
5,782
52,480
5,646
58,126
4,850
62,976
4,408
67,385
3,985
71,369
3,831
75,201
3,427
78,628
2,975
81,603
2,872
84,475
2,498
86,973
2,353
89,326
2,098
91,424
1,935
93,359
1,894
95,254
1,480
96,733
1,336
98,070
1,053
99,122
,878
100,000
-3, 31E-016
100,000
Ex trac tion Met hod: Principal Component Analys is.
Ex trac tion Sums of Squared Loadings
Total
% of Variance Cumulative %
3,389
14,120
14,120
2,768
11,533
25,653
1,970
8,209
33,862
1,542
6,425
40,287
1,539
6,411
46,698
1,388
5,782
52,480
1,355
5,646
58,126
1,164
4,850
62,976
1,058
4,408
67,385
Rotation Sums of Squared Loadings
Total
% of Variance Cumulative %
2,427
10,114
10,114
2,126
8,857
18,972
1,991
8,297
27,268
1,877
7,820
35,088
1,659
6,912
42,000
1,647
6,861
48,861
1,568
6,535
55,396
1,457
6,072
61,469
1,420
5,916
67,385
Factor Analysis
Ratio between
component number
and variable
number
ADEQUATE
For a set of 17 variables,
the ideal number of
components is 4-5.
In this case for a set of 24
variables, we have
considered 9 components
% global explained
variance
OK
About 68% - the optimal
range is 60% - 70%
Communalities
ADEQUATE
The values vary among
0,456 and 0,917
We found a problem looking at the rotated component matrix:
CORRELATION AMONG COMPONENTS AND ORIGINAL
VARIABLES
NON OPTIMAL
problematic 9th
component
Factor Analysis
Number of factors: 8
Second hypothesis:
Extraction: Principal Component Analysis
Max number of interaction: 25
Total Vari ance Ex pla ined
Component
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Total
3,389
2,768
1,970
1,542
1,539
1,388
1,355
1,164
1,058
,956
,919
,823
,714
,689
,600
,565
,504
,464
,455
,355
,321
,253
,211
-3, 1E-016
Initial Eigenvalues
% of Variance Cumulative %
14,120
14,120
11,533
25,653
8,209
33,862
6,425
40,287
6,411
46,698
5,782
52,480
5,646
58,126
4,850
62,976
4,408
67,385
3,985
71,369
3,831
75,201
3,427
78,628
2,975
81,603
2,872
84,475
2,498
86,973
2,353
89,326
2,098
91,424
1,935
93,359
1,894
95,254
1,480
96,733
1,336
98,070
1,053
99,122
,878
100,000
-1, 30E-015
100,000
Ex trac tion Met hod: Principal Component Analys is.
Ex trac tion Sums of Squared Loadings
Total
% of Variance Cumulative %
3,389
14,120
14,120
2,768
11,533
25,653
1,970
8,209
33,862
1,542
6,425
40,287
1,539
6,411
46,698
1,388
5,782
52,480
1,355
5,646
58,126
1,164
4,850
62,976
Rotation Sums of Squared Loadings
Total
% of Variance Cumulative %
2,634
10,974
10,974
2,339
9,744
20,718
1,891
7,880
28,598
1,810
7,541
36,139
1,776
7,399
43,538
1,721
7,171
50,709
1,486
6,191
56,900
1,458
6,077
62,976
Rotation : Varimax
Factor Analysis
Comm una litie s
"Casa"
"A utomobile"
"Fuori in generale"
"Ufficio/Università"
"Negoz i"
"Ristoranti"
"B ar/Discoteche"
"Regis tratore audio"
"Cassette player"
"CD player"
"Digital player"
"A utoradio"
"S tereo di c asa"
"Radio"
"Cellulare"
"USO Registrat ore audio"
"USO Cass ette play er"
"USO CD player"
"USO Digit al player"
"USO A utoradio"
"USO S tereo di cas a"
"USO Radio"
"USO Computer"
"USO Cellulare"
Initial
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
1,000
Ex trac tion
,626
,546
,522
,450
,623
,630
,450
,657
,797
,545
,670
,664
,646
,516
,736
,355
,431
,632
,870
,828
,679
,685
,814
,747
Ex trac tion Method: Principal Component Analysis .
Ratio between
component number
and variable
number
ADEQUATE
For a set of 17 variables,
the ideal number of
components is 4-5.
In this case for a set of 24
variables, we have
considered 8 components
% global explained
variance
OK
About 63% - the optimal
range is 60% - 70%
Communalities
ACCEPTABLE
The values vary among
0,431 and 0,870
Factor Analysis
Scree plot
ADEQUATE
From the 9th component , there is little increase in significance
explained.
“Quite linear
slope”
Factor Analysis
Interpretation
Rotated Component Matrixa
1
"Casa"
"Automobile"
"Fuori in generale"
"Ufficio/Università"
"Negozi"
"Ristoranti"
"Bar/Discoteche"
"Regis tratore audio"
"Cassette player"
"CD player"
"Digital player"
"Autoradio"
"Stereo di casa"
"Radio"
"Cellulare"
"USO Regis tratore audio"
"USO Cass ette player"
"USO CD player"
"USO Digital player"
"USO Autoradio"
"USO Stereo di casa"
"USO Radio"
"USO Computer"
"USO Cellulare"
,157
,547
,638
,760
,726
,536
,179
,242
,289
,242
2
3
,252
6
,409
-,195
,306
7
8
,296
-,168
,218
,310
,757
,827
,433
-,203
-,375
,448
-,279
,246
,211
-,264
Component
4
5
-,364
,499
,164
,668
,150
-,570
,570
,348
,654
,206
,332
,164
,180
,151
-,198
,198
,156
,742
1. Problems with the 9th
component it’s over.
2. We choosed Varimax
option to minimize the
number of variables that
have elevated saturations
for each factor
,825
,527
,584
,264
-,294
-,206
-,171
,665
-,857
,277
-,365
,360
-,248
-,190
-,207
,773
-,245
-,490
-,371
,816
,892
-,160
,824
Extraction Method: Principal Component Analys is.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 10 iterations.
WE CHOOSE THE SECOND
HYPOTHESIS
Rotated Component Matrixa
1
"Casa"
"Automobile"
"Fuori in generale"
"Ufficio/Università"
"Negozi"
"Ristoranti"
"Bar/Discoteche"
"Regis tratore audio"
"Cassette player"
"CD player"
"Digital player"
"Autoradio"
"Stereo di casa"
"Radio"
"Cellulare"
"USO Regis tratore audio"
"USO Cass ette player"
"USO CD player"
"USO Digital player"
"USO Autoradio"
"USO Stereo di casa"
"USO Radio"
"USO Computer"
"USO Cellulare"
,157
,547
,638
,760
,726
,536
,179
,242
,289
,242
2
3
,252
6
,409
-,195
,306
7
8
,296
-,168
,218
,310
,757
,827
,433
-,203
-,375
,448
-,279
,246
,211
-,264
Component
4
5
-,364
,499
,164
,668
,150
-,570
,570
,348
,654
,206
,332
,164
,180
,151
-,198
,198
,156
,742
,825
,527
,584
,264
-,294
-,206
-,171
,665
-,857
,277
-,365
,360
-,248
-,190
-,207
,773
-,245
-,490
-,371
,816
,892
-,160
Extraction Method: Principal Component Analys is.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 10 iterations.
,824
Factor Analysis
Interpretation
Office/University
Shops
Restaurants
Bars/Discoteque
Record player
Use record player
Cassette player
Use cassette player
Digital player
Use digital player
Radio
Use radio
Car
Car stereo
CD player
Use CD player
Home
House stereo
Use house stereo
OUTSIDE LISTENING
STEREO
DIGITAL PLAYER
RADIO
CAR LISTENING
HOUSE LISTENING
Outside in general
Use PC
PC
Mobile phone
Use mobile phone
MOBILE PHONE
Scarica

Factor Analysis