521 Statistica Applicata Vol. 18, n. 3, 2006 AN INTER-MODELS DISTANCE FOR CLUSTERING UTILITY FUNCTIONS1 Elvira Romano, Carlo Lauro Dipartimento di Matematica e Statistica, Università di Napoli Federico II, Complesso Universitario di Monte S. Angelo, Via Cinthia, I - 80126 Napoli Italy [email protected]; [email protected] Giuseppe Giordano Dipartimento di Scienze Economiche e Statistiche Università di Salerno Via Ponte Don Melillo, 84084 Fisciano (Sa), Italy [email protected] Abstract Conjoint Analysis is one of the most widely used techniques in the assessment of the consumer’s behaviors. This method allows to estimate the partial utility coefficients according to a statistical model linking the overall note of preference with the attribute levels describing the stimuli. Conjoint analysis results are useful in new-product positioning and market segmentation. In this paper a cluster-based segmentation strategy based on a new metric has been proposed. The introduced distance is based on a convex linear combination of two Euclidean distances em bedding information both on the estimated parameters and on the model fitting. Market segments can be then defined according to the proximity of the part-worth coecients and to the explicative power of the estimated models. Key words: Multiattribute Preference Data, Conjoint Analysis, Cluster Analysis, Market Segmentation. 1 This paper was financially supported by MIUR grants: “Multivariate Statistical and Visualization Methods to Analyze, to Summarize, to Evaluate Performance Indicators”, coordinated by prof. M.R. D’Esposito and “Models for Designing and Measuring Customer Satisfaction”, coordinated by prof. C.N. Lauro. 522 Romano E., Lauro C., Giordano G. 1. INTRODUCTION 2. THE DATA STRUCTURE AND THE INTER-MODELS DISTANCE 523 An inter-models distance for clustering utility functions Tab.1: The data collection. Utility models Model 1 … Model j … Model J Coefficients w11, ..., w1k, ..., Model fitting w1K – 1 … w 1j , . . . , w kj , . . . ,w jK – 1 … J w 1, . . . , w Jk, . . . ,w JK – 1 w1K … w jK … w KJ 524 Romano E., Lauro C., Giordano G. An inter-models distance for clustering utility functions 3. CLUSTERING UTILITY FUNCTIONS 525 526 4. SIMULATIONS STUDY Romano E., Lauro C., Giordano G. 527 An inter-models distance for clustering utility functions Tab. 2: Simulation plan for the three classes of models with dierent coecients and similar fitting values. In order to generate three sets of models with a quite good approximation, we build the global preference ratings according to the model (10) and with coecients given in Table 2. They are used as dependent variables in a multivariate multiple regression model with dummy explicative variables defined by the orthogonal experimental design in Table 3. Tab. 3: Experimental design. 1 2 3 4 5 6 7 8 9 10 11 12 Intercept x1 x2 x3 x4 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 -1 -1 1 1 0 0 -1 -1 1 1 0 0 -1 -1 0 0 1 1 -1 -1 0 0 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 528 Romano E., Lauro C., Giordano G. Fig: 1. Box-plot of the adj - R2, the intercept and the four estimated coecients. An inter-models distance for clustering utility functions Fig. 2. The tree structure of the simulated models. Fig. 3. Distribution of the optimum λ -values. 529 530 Romano E., Lauro C., Giordano G. Tab. 4: Simulation plan involving a hidden tree structure. Class A N. w0 w1 w2 w3 w4 ei 20 6.50 -1.33 1.00 1.25 -1.83 N(0, 1) Class B 20 6.50 -1.33 1.00 1.25 -1.83 N(0, 3) Class C 20 6.50 -0.50 -0.25 3.00 0.00 N(0, 1) 5. CONCLUDING REMARKS An inter-models distance for clustering utility functions 531 Fig. 4: Distribution of the λ -values corresponding to the maximum cophenetic coecient in 100 replications of the clustering. Fig. 5: Dendrogram of the models. λ = 1:the bad fitted models are hidden in the two cluster structure. 532 Romano E., Lauro C., Giordano G. Fig. 6: Dendrogram of the models. λ = 0.2:the cophenetic coecient is maximum, a three cluster structure is more evident. An inter-models distance for clustering utility functions 533 REFERENCES GREEN P.E., SRINIVASAN V. (1990), Conjoint Analysis in marketing: New Developments With Implication for Research and Practice, Journal of Marketing, 25: 3–19. GUSTAFSSON A., HERMANN A., HUBER F. (2000), Conjoint Measurement. Methods and Application. Berlin, Heidelberg: Springer Verlag. HENNING C. (2000), Identifiability of Models for Clusterwise Linear Regression, Journal of Classification, 17: 273–296. LAURO C., SCEPI G., GIORDANO G., (2002) Cluster Based Conjoint Analysis, in Proceedings of Sixth International Conference on Social Science Methodology, RC Logic & 33 Methodology, August 17-20, 2004 Amsterdam. PLAIA A. (2003), Constrained Clusterwise Linear Regression, in New Developments in Classification and Data Analysis, M. Vichi P. Monari, S. Mignani, S. Montanarini eds., Springer, Bologna 79–86. SPAETH H. (1979), Clusterwise Linear Regression, Computing, 22: 367–373. TAKANE Y., DE LEW J., YOUNG F.W. (1990), Regression with Qualitative and Quantitative Variables: An Alternating Least Squares Method with Optimal Scaling Features, Psycometrika, 41(4): 505–529. VRIENS M., WEDEL M. and WILMS T. (1996), Metric Conjoint Segmentation Methods: a Monte Carlo comparison, Journal of Marketing Research, 33: 73–85. CLASSIFICAZIONE DI FUNZIONI DI UTILITÀ ATTRAVERSO UNA DISTANZA TRA MODELLI Riassunto La Conjoint Analysis è una delle tecniche maggiormente utilizzate nella valutazione del comportamento dei consumatori. Questa metodologia consente di stimare i coefficienti di utilità parziale in base ad un modello statistico che lega la valutazione globale di preferenza alle caratteristiche descrittive degli stimoli (prodotti o servizi). I risultati della Conjoint Analysis trovano vasta applicazione nella segmentazione del mercato. In questo lavoro viene proposta una strategia di classificazione basata su una nuova metrica. La distanza introdotta è definita come combinazione convessa di due distanze. Essa consente di tener conto di una duplice qualità dell’informazione relativa al modello: il valore dei coefficienti stimati e la bontà di adattamento. Di conseguenza, la differenziazione tra segmenti di mercato è ottenuta considerando la prossimità dei modelli di utilità individuali stimati e la capacità predittiva degli stessi.