Some Issues in Constructing Composite Indicators DRAFT VERSION Fabio Aiello Dipartimento di Metodi Quantitativi per le Scienze Umane, Università di Palermo, [email protected] Massimo Attanasio Dipartimento di Metodi Quantitativi per le Scienze Umane, Università di Palermo, [email protected] Questo lavoro si occupa del processo di costruzione di un indicatore composto, dato da X = f[T1(x1), T2(x2), …, Tk(xk)], dove X è l’indicatore composto, le xi sono gli indicatori semplici, le Ti sono le trasformazioni e f è la funzione di aggregazione. L’obiettivo del lavoro concerne due aspetti. Il primo è l’analisi delle proprietà statistiche e matematiche delle Ti per tentare di individuare misure delle loro performance (per es. variabilità, resistenza, etc.) condizionatamente alla natura dei dati. Il secondo è fornire indicazioni circa l’adeguatezza di alcune Ti non lineari, basate sui ranghi e molto usate in pratica, per costruire un buon indicatore composto. Keywords: composite indicator, transformation, aggregating, ranking score. Introduction Composite indicators (CI) have to measure a complex and underlying concept, usually named construct, X, which is not directly measurable, so it is broken into measurable components, dimensions or items. The term CI is used in social and educational sciences, in environmental setting, in scientometrics, etc. They are useful tools for policy making and public communications in conveying information on countries performance in fields such as environment, economy, society, or technological development. Several definitions with respect to their objective are reported, so, for instance, “CI are calculated by combining well-chosen subindicators into a single index.This is most often achieved by a weighted combinations of normalized subindicators’ values”, or the process of constructing CI is described either as “a simple linear weighted function of a total of Q normalized subindicators” (Saisana et al., 2005) or as a method of establishing weights to combine indicators (Cox, 1992), or as a unique technique of “combination of disparate items or individual constituents” (Fayers & Hand, 2002). The latter one, carried out by psychometrics and more recently in a massive way by medical researchers, deals mostly with ordinal data coming from questionnaires. The construction of a CI can be exemplified following a two-way forward-backward process from the construct X, to its empirical representation and vice versa. The former decomposes the construct in several components which have to be observable and measurable quantities, while the latter aggregates the individual components into one. This process is actually a two way process which can be repeated several times to reach an adequate representation of the construct. In general, a CI is a combination of individual simple indicators in a mathematical form, in which each object represents a specific dimension of the construct whose measure is the objective of the measurement process. The steps for the construction of a CI can be summarized in this way: 1. the definition of the elements which accomplish what has to be measured through the formulation of suitable assumptions; 2. the individuation/choice of the empirical variables suitable to represent the simple indicators; 3. the process of comparing different quantities, i.e. transforming the simple indicators; 4. the individuation of the weighting system to aggregate the transformed indicators; 5. the choice of the aggregating form to put together the transformed information to get the final measure. Each step is conducted on the basis on several assumptions, choices and selection procedures which are mostly subjective and in some way biased by the authors. In fact the extreme reason of this process is that several authors express strong disagreement with the inner rationale of the process of constructing CIs (Curatolo, 1972). In the present work we attempt to have a closer look at the process of constructing CIs, splitting the process in two parts. The first one consists in normalizing raw data (here named transformation) and the latter one consists in putting them together (here named aggregating). Thus the composite indicator X can be written: X = f [T1(x1), T2(x2), …, Tk(xk)] [1.1] where the variable xi is the ith simple indicator or item, measured on a metric or on an ordinal scale, Ti is the ith transformation function and f is the aggregating (linear or non linear) function. This paper is strongly related to another paper (Aiello&Attanasio, 2004), in which the focus was on the linear transformation functions (LTs) and of aggregation (or merging) functions, through several examples commonly used in practice. Here the focus is given to some non linear transformation functions (NLTs) and to the process of aggregating, trying to give clues and warnings to practitioners for constructing CIs, relating raw data (focusing, for instance, to the scale of measurement and to the aim) to the final measure. Here the attempt is to provide a classification of the main characteristics of the functions f and T, extracted by analyzing the mechanical process of constructing real CIs. As already pointed out in our previous paper (2004), we shall attempt to answer some questions: why and where (in what cases) LTs and/or NLTs are widely used? what properties do statistical transformation must have?; what are the most common mathematical functions f that recompose the transformed data into something relevant in 2 practical usage? what is the relationship between the transformation T and the aggregating function f? when the class of non additive functions f appropriate? The paper is organized in the following way: section 2 deals with the transformation process, section 3 deals with linear transformations; section 4 with non linear transformations and section 5 with aggregating functions. Sections 2 and 3 are just a reduced version of the above cited paper. 2. Transformations to construct Composite Indicators Before dealing with the transformations we need to introduce briefly some related issues: − definition of transformations; − characteristics of xi: direction, units of measure, magnitude; − statistical properties. Definition of Transformation. A transformation of the batch x1, x2, …, xk, is a function T that replaces each xi by new value T(xi) so that the transformed values of the batch are T(x1), T(x2), …, T(xk). T is usually elementary, strictly increasing (decreasing), continuous and differentiable. Characteristics of xi’s. Each variable xk is measured with different direction, magnitude and units of measure, where: a. direction concerns the algebraic sign of the i-th variable versus the latent variable X: if high values of x yield high values in X the direction is concordant (X ∝ xi); while, if high values of x yield low values of X the direction is discordant (X ∝ xi1 ); b. magnitude of x is equal to m, if x = a·10m, with a constant; c. unit of measure is defined as a special fixed and conventional quantity. The Statistical Properties of the selected T’s are just those handy and capable to address practical data analysis problem. We chose a list of mathematical and statistical properties in order to describe T(x): a. main statistical parameters (mean, variance, range); b. resistance. T’s ought to have the following characteristics: smoothness, computational ease, comparability to the original data, and Resistance. An estimator is defined resistant if it is affected to only a limited extent either by a small number of gross errors or by any number of small rounding and grouping errors (Hoaglin et al., Ch. 11, 1983), likewise by us a transformation T could be defined resistant if it is affected by only a limited extent by a small number of outlier observations 3. Linear Transformations The LTs re-express a value x {x: x ∈ ℜ+}in the form: T(x) = y = a + bx ⇒ a, b ∈ ℜ+ [2.1] They permit to change the origin, scale and the unit of measurements of original data, but they do not change their shape. The most important characteristics of a linear transformation is proportionality. 3 LT1 and LT2 LT1 is very common in any field of application because it is easy to be computed and it has a straightforward application and meaning. In fact, dividing by the maximum allows to cancel the physical units of the original quantities and forces the results into a shorter interval. Modifying LT1 with LT2 we get a mapping into the easiest [0,1], something attractive for standardization. LT1 and LT2 determine a re-scaling of data into a shorter interval. Even if proportionality is maintained, LT1 and LT2 are not convenient in presence of strong asymmetry or in presence of outliers. LT3, LT4 and LT5 The use of normal scores as conventional numbers are well known in statistics. They are just a standard deviate, whose main characteristics are mean equal to zero and variance to one. These values make LT3 very popular because of their interpretative ease and because of the comprising of variability. A slight change to LT3 occurs when the aim is to compare scores of a group to the score of a normative group ( y ~ [M ( y ), Var ( y )] ). LT4 is widely used in psychometric score tests. Both LT3 and LT4, are not very resistant because their computation involves the mean and the standard deviation. Finally, LT5 is similar to LT4, is a resistant version of LT4 because the median and the MAD (median absolute deviation) overcome the presence of outliers so LT5 is very resistant. Table 3.1. Synoptic table of LT1, LT2, and LT3 Property T(x) LT1 T (x ) = LT2 x Max(x) Range Min( x) ≤ T (x ) ≤ 1 Max( x) Mean Max(x)-1 M(x) Variance Variability Reduction* Derivative Var ( x) ( Max( x)) 2 ( ) 1 − Max 2 ( x ) (Max(x))-1 −1 T (x ) = x − Min( x) Max( x) − Min( x) 0 ≤ T(x) ≤ 1 M ( x) − Min( x) Max( x) − Min( x) Var ( x) ( Max( x) − Min( x)) 2 LT3 T (x ) = x − M (x ) Var (x) - ∞ < T(x) < + ∞ 0 1 1 − (Max( x) − Min( x) )−2 1 − Var ( x) −1 (Max(x) – Min(x))-1 ( Var ( x) ) −1 4 Table 3.2. Synoptic table of LT4, LT5. Property LT4 T (x ) = M ( y ) + T(x) Var ( y ) ∗ [x − M (x )] Var (x ) Range Domain of Y Mean M(y) Variance Variability Reduction* Derivative * Var ( x ) − Var Var ( x) LT5 Var ( x)(MAD( x) )−2 1 − (Var ( y ) ∗ Var ( x) −1 ) [T ( x)] . x − Med ( x) MAD( x) - ∞ < T(x) < + ∞ M ( x) − Med ( x) MAD( x) Var(y) Var ( y ) ∗ Var ( x) T (x ) = −1 ( 1 − MAD 2 ( x) ) −1 (MAD(x))-1 4. Non Linear Transformations (NTLs) The NLTs can be defined as the transformations not belonging to the LTs and our interest is devoted to the Power and the to Rank Transformations, focusing on the latter group. 1. Power Transformations (PTs). The PTs have been extensively used in statistical models for analysis of experimental data for stabilizing variance, restoring normality, and removing nonadditivity. The general form is given by: ⎧ ax p + b Tp(xi) = ⎨ ⎩a log(x) + b if p ≠ 0 if p=0 The choice of the opportune value of p is conditional to the aim of the study and the nature of the original data; in general, the proper p is gained by graphical methods that can be used to roughly gauge the appropriate transformations to normality. The power p usually varies in the interval [–1, 2] and for ease of interpretation only some values of p, as [–1, 0, ½, 2], are used. Most of the times the use of a PT is preferred to get a change of shape of the original distribution, even if proportionality with original data does not hold anymore. Whereas applying a PT shape mutation of the original data is always achieved, instead for achieving an origin or scale mutation it needs to assume respectively a value of b ≠ 0 or a ≠ 0. 5 Table 4.1. Synoptic table of PTs. Properties PT1 PT2 PT3 PT4 T(xi) Range x 1/ 2 0<PT(xi)<+∞ x2 0<PT(xi)<+∞ x –1 0<PT(xi)<+∞ log(x) –∞<PT(xi)<+∞ Mean M(x½) M(x)2 M(x)–1 M(log x) Variance (≈) * Variability reduction Var ( x ) 4M( x ) 4 M 2 ( x )Var ( x ) Var ( x ) M −4 ( x ) Var ( x ) M −2 ( x ) + Derivative + 1− 1 4M( x ) 1 − 4M 2( x ) 1 2x 2 x Given by Taylor’s approximation * Var ( x ) − Var [T ( x )] Var ( x ) 1 − M −4 ( x ) 1 − M −2 ( x ) 1 1 x − x 2 + ∞ The PT here considered have no a limited codomain, because they vary into or in . Where p ∈ [ −1, 1[ there is always a reduction of range and hence of variability, whereas if p ≥ 1 there is an expansion of range and hence of variability. The resistance, as defined in the previous section, does not involve these transformations because they do not contain any parameter. To construct composite indicators raw data are frequently transformed with the log transformation, p = 0, essentially because it linearizes data and reduces shewness. One disadvantage comes out when raw data are concentrated on a narrow range between zero and one, because the re-expressed data become large negative numbers. Other PTs commonly used are given by p = –1 and by p = 2. For instance, the financial newspaper Il Sole 24ore in the survey Qualità della Vita on the 103 Italian Provinces suggests two types of T’s the first is the LT1, the second is a PT with p = –1: T1( x i ) = x i * Max{x i }* 1000 when xi concordant to X [4.1] T2 ( x i ) = Min{x i }* x i−1 1000 when xi discordant to X [4.2] After transforming raw data they sum up 21 proportional quantities as those given by [4.1] and 15 inversely proportional quantities as those given by [4.2]. This operation is not appropriate mathematically because it produces a result whose mathematical relationship to the original xi’s is not proportional (Attanasio and Capursi, 1997). A solution to overcome this problem is given by modifying the T2 ( xi ) into: 6 T2 ( x i ) = − x i * Max{x i }* 1000 when xi discordant to X A PT with p = 2 is used in the construction of the Body Mass Index, an empirical tool for indicating weight status, while a PT with p = ⅓ is used in the construction of one Keyword Effectiveness Index, a tool to measure the effectiveness of a keyword in the construction of a website. 2. Rank Transformations (RTs). Here we just encompass three types of transformations, as rank, ranking score, and categorical scale, in order to describe the most common usages in practical applications. Rank (RT). The RT is a class of monotone ordinal functions that maps data to ordinal data, usually labelled as numbers or letters. It is given by: T ( x i ) = rank{x i } T (xi )∈ Ο where Ο is an ordered set. It is a quick and easy way to understand and describe data just sorting them, whose advantages and disadvantages are well known. Ranking Score (RS). Analogously, the RS can be defined as a class of monotone functions that maps data to interval data, given by: T ( x i ) = score{x i } T (xi )∈ Ι where Ι is a set whose elements are isomorphic with an interval scale. The most frequent application of a RST is the assignment of scores to ordered categories of answers/items of a questionnaire/evaluation sheet. This operation underlies the strong assumption of superimposing a metric on ordinals. On the other hand, there are applications in which data measured on a ratio scale are lowered to an interval scale. For instance, arrival time at each Grand Prix race are lowered to an interval scale and the points assignment rule is a score function whose steps are not equally spaced. It is interesting to verify practically that a metric batch of data {xi; i = 1, …, n} can be properly transformed through a RST. For instance figure 1a shows as the scatter plot is not well fitted by a straight line (R2 = 0.80), so the RST is not convenient. Instead figure 1b shows a case when the RST fits well a straight line (R2 = 0.96). This property becomes crucial when we add up several items. 7 Figure 4.1. Scatter plot showing different cases of RSTs 30 30 R2 = 0.80 20 RST(xi) 20 R2 = 0.96 10 10 0 0 0 20 40 xi 60 a 80 100 0 20 40 xi 60 80 100 b It is very common to find that CIs are given by the sum of several indicators previous transformed by a RSTs: k X = ∑ RST ( xi ) i =1 It is crucial to verify either that for each i-th component R2(Xxi)’s is statistically equal to 1, even if the multiple determination coefficient R2(X.x1…xk) fits adequately. Categorical Scale (CST). Finally, the CST correspond to the usual operation of grouping a batch of data into categories to summarize them and to display them in a easy and understandable way. It can be seen as a special case of the RT when there are few categories. In fact, for many ordinal categorical variables it is sensible to imagine the existence of an underlying continuous variable. To approximate the underlying scale, if is often useful to assign a reasonable set of scores to the categories (Agresti, ch.1, 1984). Actually two types of CST are used: the first one provides scores or ordinals based on percentiles to ensure adequate representation at each category, while the second one provides scores or ordinals based on cut-points selected a priori whose meaning is familiar. For example, the ECTS (European Credits Transfer System) adopts an Evaluation Scale based on percentiles to convert grades from an educational system to another, so the ECTS Grade A is given to the students whose grade is upper the 90-th percentile, the ECTS Grade B is given to the students whose grade is between the 90-th percentile and the 65-th one, and so forth. Instead absolute categorical scales are based on standards/references and their aim is for instance to categorize a disease into stages or to classify subjects into groups on the basis of age/physical/clinical characteristics, ect. 5. Aggregating Function The process of choosing an appropriate aggregating function (AF) to combine in a meaningful way different dimensions is related to the transformation function (TF) previously adopted. Most of the times the functional form of the AFs is additive , as well as 8 for the TFs (Aiello and Attanasio, 2004). Results coming from a comparison between different TFs as linear, rank, and non linear, conducted with the data of the survey Qualità della Vita on the 103 Italian Provinces with 36 simple indicators, show clearly that: LTs provide very close final rankings; the RT final ranking exhibits moderate differences compared to the LTs ones, while the NTL final ranking is faraway from all the others. This suggests how important is the choice of the TF (Attanasio and Capursi, 1997). This section is divided into two parts: the first one concerns the use of non additive AFs and the latter the implications of different score sets in the construction of a composite indicator X. 1. Non additive AF Even if additive AFs are the most used for their ease of interpretation, non linear AFs are frequent. Most of the times the choice of the appropriate non linear function arises empirically or it comes up just because specific non linear functions are able to explain better the construct under study. For instance, Discomfort or Heat-Related Stress Indexes are empirical tools used in physics combining in a non additive way air temperature, humidity, wind, direct sunlight, etc. More recently, new indicators have been proposed (Keyword Effectiveness Index) to measure how effective is a keyword to identify a website. Their mathematical form are non linear and arise empirically. Moreover, in clinical epidemiology the ROC (Receiver Characteristic Curve) is used to assess globally the performance of a screening test and it is an empirical tool whose functional form is an integral. Finally, a well known medical diagnostic tool for the weight status is the Body Mass Index, which is given by the ratio of the weight over the square of the height. 2. Different Score Sets in the construction of CIs. Here, the main focus was the construction of ranking scores in presence of data coming from questionnaires or from evaluation sheets. This can be conducted into two directions: the first one is vertically, i.e. the aim is to measure single items according to the answers of several respondents, or horizontally, i.e. the aim is the assess the total score given by the same respondent. These two patterns are analogous from a methodological point of view: the first one corresponds to assign weight to items, while the second corresponds to assign scores to answer categories. Our applications are confined to the latter case. We try to analyze how different ranking score vectors affect the results of the composite indicators X through two examples coming from real data. Educational Data. This example is referred to the Survey on Teaching Activities according to the Students Opinions conducted in the Italian universities. The questionnaire contains multiple choice answers with a four ordinal multiple response. The usual ranking score RS1 assigns equally spaced scores (1, 2, 3, 4), where the higher the score the higher is the level of the degree of accordance with the item. Another ranking RS2, introduced by a national steer committee, suggests this alternative set of scores (2, 5, 7, 10). The straight line passing through the points (1; 2), (2; 5), (3; 7) and (4; 10) fits well, in fact: 9 RS2 = 2.4333⋅RS1 R2 = 0.989. [4.3] Here the meaning of R2 is just mathematical. Thus, for each item it is possible to sum them up and then to calculate the overall score given by all the students or by a subgroup. Conversely for each student it is possible to sum them and then to calculate the overall score given to all the items of the questionnaire o to the items of a section. Considering the first case, we analyze different distributions for items I1, I2, and I3 with a sample of 55 respondents from the 2001-02 Survey conducted at the University of Palermo (Capursi and Librizzi, 2006). Table 5.1. Distribution of RS1 and RS2 I1 I2 I3 n1 n2 n3 RS1 1 2 1 26 2 2 9 17 3 26 29 8 4 25 16 4 OS(1) 184 170 100 RS2 2 5 7 10 I1 n1 2 2 26 25 OS(2) 446 ES 447.7 ES/OS(2) 1.004 I2 n2 1 9 29 16 I3 n3 26 17 8 4 410 413.6 1.009 233 243.3 1.044 where, for each item: 4 OS ( i ) = ∑ nk RSk ( i ) i = 1, 2, k =1 and ES = 2.4333⋅OS(1). The quantities OS(2) and ES are very close for each item distribution, denoting how RS1 and RS2 do not produce different results even with asymmetrical distributions. Both the R2 value and the values of O’s and E’s clearly show RS2 is just a rescaled version of RS1. This is due to the fact that RS2 replaces RS1 without any strong imbalance towards any category. Formula One World Championship data. Scores are assigned according to the race position at each GP through a RST, which is not proportional to the time race. The rule introduced from the 2003 lets the championship be more attractive till the last races, with the aim of giving an award to drivers with regular performances. Both RS1 (till 2002) and RS2 (from 2003) are not equally spaced scoring, even if the second one is closer to an equally spaced one, in fact the distance between the first and the second place was shortened (table 5.2). Table 5.2. Score rules by race position (F1 Champ.). 10 Race position 1 2 3 4 5 6 7 8 RS1 10 6 4 3 2 1 - RS1(EQ) 6 5 4 3 2 1 - RS2 10 8 6 5 4 3 2 1 RS2(EQ) 8 7 6 5 4 3 2 1 Comparison of RS1 versus RS1(EQ) and RS2 versus RS2(EQ), by fitting different least square curves, give the following results: RS1 = 1.657⋅RS1(EQ) – 1.467 RS1 = 0.743 exp(0.43⋅RS1(EQ)) R2 = 0.901 R2 = 0.985 [4.4] RS2 = 1.226⋅RS2(EQ) – 0.643 RS2 = 0.089⋅RS2(EQ)2 + 0.423⋅RS2(EQ) + 0.696 R2 = 0.973 R2 = 0.994 [4.5] As expected, [4.4] and [4.5] show, taking the meaning already given to R2, that the linear fit of RS1 versus RS1(EQ) is “much” worse than the exponential fit, since there are just eight observations. This occurs because distances between the first places are relevant. Instead, the RS2 behavior is very close to RS2(EQ) because the modifications introduced in 2003 brought scores nearer to an equally spaced scoring in fact the gain in terms of R2 with the quadratic form is not relevant. A further comparison of the above RSTs is given by the total scores (TS) reported by the first five drivers after five GPs at the 2002 championship. Total scores TS1, TS2, and TS(EQ) are given by summing up the corresponding score set reported at each race: 5 TS = ∑ RS k =1 and the distances (d) are: d(r, s) = TS(r) – TS(s) with r > s (r ≠ s) where r is the final position of the i-th driver (i = 1, …, 5) and s is the final position of the (i+1)-th driver, at the end of the fifth race. So assigning the scoring set RS1, RS2 and RS1(EQ), we get the line plots of the distances between first and second, second and third and so forth. It is evident how the values of RS2 and RS1(EQ) are close, while the RS1 line plot is different (figure 5.1). Figure 5.1. RST distance comparisons. 11 25 RS1 RS2 20 RS1(EQ) 15 10 5 0 d(1, 2) d(2, 3) d(3, 4) d(4, 5) The two examples above reported suggest that is convenient to use an equally spaced ranking score set, rather than an equally spaced one, unless the distance between adjacent categories are “very non equally spaced”. 6. Conclusions Most of the times applications of CIs do not take into account the mathematical and statistical (and sometimes also common sense) reasoning, so it seems to us that there is the necessity of some work to establish rules to obtain meaningful summary statistics, as composite indicators. According to this issue, the main objective of this paper is to give few instrumental tools to practitioners for the construction of CIs by means of analyzing its mechanics, i.e. what tools at your disposal to guarantee, or not, some results or some properties. Moreover we concerned our attention to the ranking score assignment in multiple choice questions and we found it is convenient to change an equally spaced ranking score set with a non equally spaced one only when the distances between the categories are “very non equally spaced”. In other words, there has to be a strong evidence of non equally distances among categories to assume varying distances between response categories. An eventual development of this argument may be carried out by Rasch models. They may be useful to investigate on detecting the suitable spaced scoring set; in fact, such models assume the sequential order of the thresholds and the distance between adjacent answers categories are free to vary, reflecting data structure. Issues here involved can be applied pairwise to those concerning the weighting assignment to individual indicators in the construction of CIs and to the selection of indicators. This is a special case in which the weights are zero. Finally, we are aware that the problem of the choice between equally (not) score (or weighting) assignments as to be tackled either with statistical methods or procedures or with extra statistical arguments. 12 References Aczél J. (1987). A Short Course on Functional Equations. D. Reidel Publishing Company, Dordrecht. Aiello F., Attanasio M. (2004). “How to transform a batch of simple indicators to make up a unique one?” Atti del Convegno SIS giugno 2004, Bari. Sessioni Specializzate, pp. 327 – 338. Atkinson A.C., Cox D.R. (1982). Transformations, in: Encyclopaedia of Statistical Sciences. Kotz S. & Johnson N.L. (Eds.). Wiley. New York. Attanasio M., Capursi V. (1997). Graduatorie sulla qualità della vita: prime analisi di sensibilità delle tecniche adottate. Atti XXXV Riunione Scientifica SIEDS, Alghero. Bartholomew D.J. (1996). The Statistical Approach to Social Measurement, Academic Press, San Diego. Cox D., Fitzpatrick R., Fletcher A., Gore S., Spiegelhalter D. and Jones D. (1992). Qualityof-life assessment: can we keep it simple? J.R.S.S. 155 (3), 353 – 393. Curatolo R. (1972). Indicatori sociali, Atti della XXVII Riunione Scientifica SI, Vol. 1, pp. 19-151. Fayers P.M., Hand D.J. (2002). Casual Variables, Indicator Variables and Measurement Scales: an example from quality of life. JRRS, A, 165, 233 – 261. Fletcher R.H., Fletcher S.W., Wagner E.H. (1982). Clinical Epidemiology – the essentials, Williams & Wilkins. Baltimore. Hoaglin D.C., Mosteller F., Tukey J.W. (1983). Understanding Robust and Exploratory Data Analysis. Wiley, New York. Jacobs R., Smith P., Goddard M. (2004). Measuring performance: an examination of composite performance indicators. Centre for Health Economics, Technical Paper Series 29. Inskip H. (1998). Standardized Methods, in: Encyclopaedia of Biostatistics, Armitage P. & Colton T. (Eds.), Wiley, 6, 4237 – 4250. Kendall M., Stuart A., Ord J.K. (1983). The Advanced Theory of Statistics. Charles Griffin and C. 3, 97. 13 Krantz D.H., Luce R.D., Suppes P., Tversky A. (1971). Foundations of Measurement, Vol. 1, Acedemy Press, New York. Luce R.D., Krantz D.H., Suppes P., Tversky A. (1990). Foundations of Measurement (Vol. III). San Diego: Academic Press. Nardo M., Saisana M., Saltelli A., Tarantola S. (2005). Tools for Composite Indicators Buildings. Report EUR 21682 EN. European Commission-Joint Research Centre, Ispra. Prieto L. et al. (1996). Scaling the Spanish Version of the Nottingham Health Profile: Evidence of Limited Value of Item Weights. J. Clin. Epi., 49, 31 – 38. Elsevier Science. Saisana M., Tarantola S. (2002). State-of-the-art report on current methodologies and practices for composite indicator development. Report EUR 20408 EN. European Commission-Joint Research Centre, Ispra. Saisana M. (2004). Composite indicators – A review, Second Workshop on Composite Indicators of Country Performance, Feb. 26 – 27th 2004. OECD, Paris. Streiner D.L., Norman G.R. (1999). (eds.) Health Measurement Scales. A practical guide to their development and use. 2nd Ed. Oxford University Press, New York. Stevens S. S. (1974). Measurement, in: Scaling: a sourcebook for behavioural scientists, Maranell M. (ed.), Aldine Publishing Company. Chicago. UNC Charlotte Dept. Of Geography and Earth Sciences, UNC at Charlotte (2002). Charlotte Neighborhood Quality of Life Study. http://www.charmeck.org/NR/rdonlyres/ …/2002+Quality+of+Life+Study.pdf. 14