Metodi Quantitativi per Economia, Finanza e Management Lezione n°8 L’utilizzo dell’analisi fattoriale nella costruzione di un modello di regressione lineare multipla Analisi fattoriale Quante componenti considerare? 1. metodo degli autovalori >1 2. 3. rapporto tra numero di componenti e variabili (circa 1/3) percentuale di varianza spiegata (almeno 60%) 4. lo SCREE PLOT (plot di autovalore vs il numero di fattori) Se il plot mostra un “gomito” è plausibile ipotizzare l’esistenza di una struttura latente, se la forma è quasi rettilinea significa che i fattori sono solo una trasformazione delle variabili manifeste. I fattori rilevanti sono quelli al di sopra del gomito (a discrezione anche quello in corrispondenza del gomito). Se non ci sono fattori predominanti il criterio è inadatto. Analisi fattoriale Quante componenti considerare? 5. Comunalità: - confronto tra le comunatità di più soluzioni - la quota di varianza spiegata di ciascuna variabile dalla soluzione scelta deve essere soddisfacente Analisi fattoriale Come interpretarle? 1. rotazione delle componenti La rotazione ortogonale nello spazio dei fattori non influenza la validità del modello: sfruttiamo questa caratteristica per ottenere dei fattori più facilmente interpretabili. – – – 2. The Varimax method of rotation, suggested by Kaiser, has the purpose of minimizing the number of variables with high saturations (correlations) for each factor The Quartimax method attempts to minimize the number of factors tightly correlated to each variable The Equimax method is a cross between the Varimax and the Quartimax correlazioni tra componenti principali e variabili originarie Esempi di Analisi Fattoriale di vecchi lavori di gruppo Esempio Importanza dell’Informazione e modalità di acquisizione Obiettivo della ricerca è comprendere quali siano i principali mezzi informativi, il relativo indice di gradimento e quali siano gli argomenti di maggior interesse. Analisi fattoriale: Le variabili considerate sono i 14 parametri che influenzano la scelta del canale e quelli che influenzano la scelta relativa al tipo di fonte In funzione di cosa scegli il canale? Su una scala da 1 a 10 (dove 1= per niente e 10= moltissimo) esprimi un giudizio sull’importanza: semplicità costo velocità di acquisizione comodità tempo di aggiornamento 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 In funzione di cosa scegli le fonti? Su una scala da 1 a 10 (dove 1= per niente e 10= moltissimo) esprimi un giudizio sull’importanza: orientamento politico temi trattati area geografica di interesse direttore formato / stile con chi vivi redazione giornalisti/speaker qualità servizi 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 1 2 3 4 5 6 7 8 9 10 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 Esempio Importanza dell’Informazione e modalità di acquisizione Le variabili considerate sono i parametri che influenzano la scelta del canale e quelli che influenzano la scelta relativa al tipo di fonte Esempio: Importanza dell’Informazione e modalità di acquisizione Autovalori della matrice di correlazione: Totale = 14 Media = 1 Autovalore Differenza Proporzione Cumulata 1 3.16944223 0.52227941 0.2264 0.2264 2 2.64716281 1.35701039 0.1891 0.4155 3 1.29015243 0.02599489 0.0922 0.5076 4 1.26415754 0.2604549 0.0903 0.5979 5 1.00370264 0.20036187 0.0717 0.6696 6 0.80334077 0.01216326 0.0574 0.727 7 0.79117751 0.13231428 0.0565 0.7835 8 0.65886322 0.03460029 0.0471 0.8306 9 0.62426293 0.12396136 0.0446 0.8752 10 0.50030158 0.09138333 0.0357 0.9109 11 0.40891825 0.04258591 0.0292 0.9401 12 0.36633234 0.09276211 0.0262 0.9663 13 0.27357023 0.07495472 0.0195 0.9858 14 0.19861552 0 0.0142 1 9 Esempio: Importanza dell’Informazione e modalità di acquisizione SCREE PLOT 3.5 3 AUTOVALORI 2.5 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 FATTORI 9 10 11 12 13 14 Esempio: Importanza dell’Informazione e modalità di acquisizione VARIABILI DESCRIZIONE D_17_s semplicità D_17_c costo D_17_v velocità di acquisizione D_17_com comodità D_17_tda tempo di aggiornamento D_20_orp orientamento politico D_20_tt temi trattati D_20_ag area geografica di interesse D_20_d direttore D_20_fs formato/stile D_20_ccv con chi vivi D_20_r redazione D_20_gs giornalisti/speaker D_20_qs qualità servizi totale % CUMUNALITA' n=5 n=6 0.70 0.73 0.82 0.66 0.73 0.80 0.65 0.68 0.72 0.56 0.54 0.70 0.63 0.46 0.70 0.73 0.84 0.71 0.81 0.81 0.70 0.84 0.73 0.63 0.56 0.73 0.67 0.74 9.37 10.18 66.96% 72.70% CONFRONTO CUMUNALITA’ FINALI: Esempio: Importanza dell’Informazione e modalità di acquisizione Schema fattoriale Factor1 Factor2 Factor3 Factor4 Factor5 D_17_s semplicità 0.56626 . . . 0.46051 D_17_c costo 0.35685 . 0.65469 . 0.3875 D_17_v velocità 0.75292 . . . . D_17_com comodità 0.68764 -0.36206 . . . D_17_tda tempo di aggiornamento 0.5326 -0.43612 . . -0.38524 D_20_orp orientamento politico . 0.54298 . 0.53024 . 0.41299 . . 0.53419 . D_20_tt temi trattati D_20_ag area geografica di interesse . . -0.5248 . 0.38026 D_20_d direttore . 0.74874 . . . D_20_fs formato/stile 0.38261 . . -0.43544 . D_20_ccv con chi vivi . 0.50515 . . . D_20_r redazione . 0.72899 . . . D_20_gs giornalisti/speaker 0.58604 0.49902 . . . D_20_qs qualità servizi 0.63683 . . . . I valori minori di 0.35 non sono stampati. Lo schema fattoriale a 5 fattori, così come si presenta, è di difficile interpretazione; per questo risulta opportuno ruotare i fattori attraverso un apposito metodo (VARIMAX). Esempio: Importanza dell’Informazione e modalità di acquisizione SCHEMA FATTORIALE RUOTATO D_17_v velocità Factor1 Factor2 Factor3 Factor4 Factor5 0.8578 . . . . 0.7885 . . . . D_17_tda tempo di aggiornamento D_17_com comodità 0.70345 . . 0.39398 . qualità servizio 0.53133 . . . . D_20_qs D_20_r rapidità . 0.74824 . . . D_20_fs formato/stile . 0.71171 . . . D_20_ccv con chi vivi . 0.70059 . . . D_20_gs giornalisti/speaker . 0.62098 0.36737 . . D_20_orp orientamento politico . . 0.8923 . . D_20_d direttore . . 0.77647 . . D_17_c costo . . . 0.83334 . D_17_s semplicità . . . 0.65037 0.45187 D_20_ag area geografica di interesse . . . . 0.7622 D_20_tt temi trattati . . . . 0.71198 I valori minori di 0.35 non sono stampati. Rapidità di acquisizione e qualità del servizio offerto. Esposizione dell’ informazione Affinità politica/ ideologica Accessibilità al sevizio Attrattività argomenti trattati Coffee Consumption in Italy Factor Analysis We ran a Factor Analysis on two numerical questions from the survey that we felt might have correlated variables: Q15 (“What are you general coffee preferences?”) and Q16 (“If you drink your coffee outside (in a bar/coffee place) which are the main factors that, in general, influence your decision on where you drink your coffee?”). • We used the Principal Components Method that was supposed to solve the multicollinearity problem among our variables and provide us with summarized number of variables/factors which are not correlated (standardized by definition, with mean 0, standard deviation 1) to better explain and understand the specific situation of coffee consumption. • This represents a preliminary phase for cluster analysis and regression analysis. Initial Variables used for analysis On the right, there are our initial 21 variables (taken from Q15 and Q16) that we selected for running the factor analysis. Judging by the SPSS Correlation Matrix (that is not present in the slide because of its big size – please see the output for the check), we have many variables which are significantly correlated. Need for FACTOR REDUCTION! Start real Factor Analysis! Choosing the right number of factors 1. 2. 3. 4. 1/3 criteria: 21/3= 7 factors Variance explained (60%-75%): 7, 8, 9, 10 factors Scree Plot: 6, 8, 10 factors Eigenvalues: 6, 7, 8 factors The optimal values seem to be 7 or 8 factors. Choosing the right number of factors – continued - The present Scree Plot represents the number 3 criteria of number of factors selection from the previous slide. Factor Analysis with 8 Factors After analyzing the Communalities table, we identified one variable that is not properly explained by our 8 selected factors (0.387 is not satisfying)! This variable is Price which we consider an important variable in our analysis! Decreasing the number of factors to 7, will not improve the explanatory power of the variables for the price! We decided to exclude the Price variable from this factor analysis and consider it as a separate factor (given its very high importance from our qualitative point view) in the future analysis: cluster & regression analysis. Factor Analysis with 20 Factors After elimination of the Price variable 1. 2. 3. 4. 1/3 criteria: 20/3= 6 factors Variance explained (60%-75%): factors Scree Plot: 6, 7, 9 factors Eigenvalues: 6, 7, 8 factors 7, 8, 9 The optimal choice seems to be 7 factors. Factor Analysis with 20 Factors After elimination of the Price variable -continued- The present Scree Plot represents the number 3 criteria of number of factors selection from the previous slide. Factor Analysis with 7 Factors After analyzing the Communalities table, we that so far the 7 factors properly explain the initial variables. All communalities are over 0.400, which is a good result. We are ready to take a look at the Rotated Component Matrix to see if the factors make sense/can be explained! Factors - explained • • • 1. 2. 3. 4. 5. 6. 7. The method used for rotation was Varimax. After closely analyzing the Rotated Component Matrix, we tried to give meaning to our 7 factors. The names of the respective factors are the following: Socialization factor Internet/ Trendiness factor Close meeting place factor Intellectual/ nonsmoking factor Familiarity factor Variety/To Go factor Traditionality & Addiction factor Factors – explained - continued 1. Socialization Factor Socialize, sit down, being with friends, cozy atmosphere 2. Internet/Trendiness Factor Wi-Fi availability, internet, trendy place 3. Close meeting place Factor Close to home/work/school, ability to meet people, quality of coffee not important 4. Intellectual/Non-smoking Factor Non-smokers, usually snack, love to read 5. Familiarity Factor Go to the same bar, do not like trying new places, concerned about quality of coffee 6. Variety/To-go Factor Variety and coffee to go, non traditional Italian coffee, preference for taking coffee alone 7. Traditionality/Addiction Factor Italian coffee preference, addicts The consumption of Digital Music and its impact on the Music Industry Factor Analysis We have taken into consideration questions n° 4,9,10 and therefore we have 24 variables We asked interviewees to give a score from 1 to 9 (1: “I don’t like it” 9: “I love it”) or to use percentages Quest.n.4: score Quest.n.9: score Quest.n.10: % 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. Home Car Outside in general Office/University Shops Restaurants Bars/discoteque Record player Cassette player CD player Digital player Car stereo House stereo Radio Mobile phone USE record player USE cassette player USE CD Player USE digital player USE car stereo USE house stereo USE radio USE PC USE mobile phone Factor Analysis Number of factors: 9 First hypothesis: Extraction: Principal Component Analysis Max number of interaction: 25 Rotation : Varimax Total Vari ance Ex pla ined Component 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Total 3,389 2,768 1,970 1,542 1,539 1,388 1,355 1,164 1,058 ,956 ,919 ,823 ,714 ,689 ,600 ,565 ,504 ,464 ,455 ,355 ,321 ,253 ,211 -7, 9E-017 Initial Eigenvalues % of Variance Cumulative % 14,120 14,120 11,533 25,653 8,209 33,862 6,425 40,287 6,411 46,698 5,782 52,480 5,646 58,126 4,850 62,976 4,408 67,385 3,985 71,369 3,831 75,201 3,427 78,628 2,975 81,603 2,872 84,475 2,498 86,973 2,353 89,326 2,098 91,424 1,935 93,359 1,894 95,254 1,480 96,733 1,336 98,070 1,053 99,122 ,878 100,000 -3, 31E-016 100,000 Ex trac tion Met hod: Principal Component Analys is. Ex trac tion Sums of Squared Loadings Total % of Variance Cumulative % 3,389 14,120 14,120 2,768 11,533 25,653 1,970 8,209 33,862 1,542 6,425 40,287 1,539 6,411 46,698 1,388 5,782 52,480 1,355 5,646 58,126 1,164 4,850 62,976 1,058 4,408 67,385 Rotation Sums of Squared Loadings Total % of Variance Cumulative % 2,427 10,114 10,114 2,126 8,857 18,972 1,991 8,297 27,268 1,877 7,820 35,088 1,659 6,912 42,000 1,647 6,861 48,861 1,568 6,535 55,396 1,457 6,072 61,469 1,420 5,916 67,385 Factor Analysis Ratio between component number and variable number ADEQUATE For a set of 17 variables, the ideal number of components is 4-5. In this case for a set of 24 variables, we have considered 9 components % global explained variance OK About 68% - the optimal range is 60% - 70% Communalities ADEQUATE The values vary among 0,456 and 0,917 We found a problem looking at the rotated component matrix: CORRELATION AMONG COMPONENTS AND ORIGINAL VARIABLES NON OPTIMAL problematic 9th component Factor Analysis Number of factors: 8 Second hypothesis: Extraction: Principal Component Analysis Max number of interaction: 25 Total Vari ance Ex pla ined Component 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Total 3,389 2,768 1,970 1,542 1,539 1,388 1,355 1,164 1,058 ,956 ,919 ,823 ,714 ,689 ,600 ,565 ,504 ,464 ,455 ,355 ,321 ,253 ,211 -3, 1E-016 Initial Eigenvalues % of Variance Cumulative % 14,120 14,120 11,533 25,653 8,209 33,862 6,425 40,287 6,411 46,698 5,782 52,480 5,646 58,126 4,850 62,976 4,408 67,385 3,985 71,369 3,831 75,201 3,427 78,628 2,975 81,603 2,872 84,475 2,498 86,973 2,353 89,326 2,098 91,424 1,935 93,359 1,894 95,254 1,480 96,733 1,336 98,070 1,053 99,122 ,878 100,000 -1, 30E-015 100,000 Ex trac tion Met hod: Principal Component Analys is. Ex trac tion Sums of Squared Loadings Total % of Variance Cumulative % 3,389 14,120 14,120 2,768 11,533 25,653 1,970 8,209 33,862 1,542 6,425 40,287 1,539 6,411 46,698 1,388 5,782 52,480 1,355 5,646 58,126 1,164 4,850 62,976 Rotation Sums of Squared Loadings Total % of Variance Cumulative % 2,634 10,974 10,974 2,339 9,744 20,718 1,891 7,880 28,598 1,810 7,541 36,139 1,776 7,399 43,538 1,721 7,171 50,709 1,486 6,191 56,900 1,458 6,077 62,976 Rotation : Varimax Factor Analysis Comm una litie s "Casa" "A utomobile" "Fuori in generale" "Ufficio/Università" "Negoz i" "Ristoranti" "B ar/Discoteche" "Regis tratore audio" "Cassette player" "CD player" "Digital player" "A utoradio" "S tereo di c asa" "Radio" "Cellulare" "USO Registrat ore audio" "USO Cass ette play er" "USO CD player" "USO Digit al player" "USO A utoradio" "USO S tereo di cas a" "USO Radio" "USO Computer" "USO Cellulare" Initial 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 Ex trac tion ,626 ,546 ,522 ,450 ,623 ,630 ,450 ,657 ,797 ,545 ,670 ,664 ,646 ,516 ,736 ,355 ,431 ,632 ,870 ,828 ,679 ,685 ,814 ,747 Ex trac tion Method: Principal Component Analysis . Ratio between component number and variable number ADEQUATE For a set of 17 variables, the ideal number of components is 4-5. In this case for a set of 24 variables, we have considered 8 components % global explained variance OK About 63% - the optimal range is 60% - 70% Communalities ACCEPTABLE The values vary among 0,431 and 0,870 Factor Analysis Scree plot ADEQUATE From the 9th component , there is little increase in significance explained. “Quite linear slope” Factor Analysis Interpretation Rotated Component Matrixa 1 "Casa" "Automobile" "Fuori in generale" "Ufficio/Università" "Negozi" "Ristoranti" "Bar/Discoteche" "Regis tratore audio" "Cassette player" "CD player" "Digital player" "Autoradio" "Stereo di casa" "Radio" "Cellulare" "USO Regis tratore audio" "USO Cass ette player" "USO CD player" "USO Digital player" "USO Autoradio" "USO Stereo di casa" "USO Radio" "USO Computer" "USO Cellulare" ,157 ,547 ,638 ,760 ,726 ,536 ,179 ,242 ,289 ,242 2 3 ,252 6 ,409 -,195 ,306 7 8 ,296 -,168 ,218 ,310 ,757 ,827 ,433 -,203 -,375 ,448 -,279 ,246 ,211 -,264 Component 4 5 -,364 ,499 ,164 ,668 ,150 -,570 ,570 ,348 ,654 ,206 ,332 ,164 ,180 ,151 -,198 ,198 ,156 ,742 1. Problems with the 9th component it’s over. 2. We choosed Varimax option to minimize the number of variables that have elevated saturations for each factor ,825 ,527 ,584 ,264 -,294 -,206 -,171 ,665 -,857 ,277 -,365 ,360 -,248 -,190 -,207 ,773 -,245 -,490 -,371 ,816 ,892 -,160 ,824 Extraction Method: Principal Component Analys is. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 10 iterations. WE CHOOSE THE SECOND HYPOTHESIS Rotated Component Matrixa 1 "Casa" "Automobile" "Fuori in generale" "Ufficio/Università" "Negozi" "Ristoranti" "Bar/Discoteche" "Regis tratore audio" "Cassette player" "CD player" "Digital player" "Autoradio" "Stereo di casa" "Radio" "Cellulare" "USO Regis tratore audio" "USO Cass ette player" "USO CD player" "USO Digital player" "USO Autoradio" "USO Stereo di casa" "USO Radio" "USO Computer" "USO Cellulare" ,157 ,547 ,638 ,760 ,726 ,536 ,179 ,242 ,289 ,242 2 3 ,252 6 ,409 -,195 ,306 7 8 ,296 -,168 ,218 ,310 ,757 ,827 ,433 -,203 -,375 ,448 -,279 ,246 ,211 -,264 Component 4 5 -,364 ,499 ,164 ,668 ,150 -,570 ,570 ,348 ,654 ,206 ,332 ,164 ,180 ,151 -,198 ,198 ,156 ,742 ,825 ,527 ,584 ,264 -,294 -,206 -,171 ,665 -,857 ,277 -,365 ,360 -,248 -,190 -,207 ,773 -,245 -,490 -,371 ,816 ,892 -,160 Extraction Method: Principal Component Analys is. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 10 iterations. ,824 Factor Analysis Interpretation Office/University Shops Restaurants Bars/Discoteque Record player Use record player Cassette player Use cassette player Digital player Use digital player Radio Use radio Car Car stereo CD player Use CD player Home House stereo Use house stereo OUTSIDE LISTENING STEREO DIGITAL PLAYER RADIO CAR LISTENING HOUSE LISTENING Outside in general Use PC PC Mobile phone Use mobile phone MOBILE PHONE