Statistica & Applicazioni Vol. 2, n. 1, 2004 Some properties of the Arctangent Distribution Angiola Pollastri§ Franco Tornaghi‡ Summary: In the present paper some characteristics of the Arctangent Distribution are analysed. Arctangent was utilised in order to obtain simultaneous confidence intervals of the probabilities of the multinomial distribution. In particular, the asimmetry and the relations with the Folded Standard Normal and the Folded Skew Normal are studied. Then the distribution of the maximum and of the minimum of the absolute value of the component of a Bivariate Correlated Normal r.v are studied and used to improve confidence region and corresponding tests of hypotheses for the two means of a Bivariate Correlated Normal. Keywords: Asymmetry, Folded Standard Normal, Skew–Normal distribution, Bivariate Correlated Normal. 1. Introduction The Arctangent distribution was proposed by Zenga (1979). The Arctangent density function (d. f.) with parameter a>0, denoted by Fg i.1D ensityfunctionsoftheA rctangentr.v. fora=0,2;0,8;1;5. 1 0,8 a= 0,2 0,6 a= 0,8 a= 1 0,4 a= 5 0,2 0 0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5 g(x;a) , is 1 2 −2 x e g ( x; a ) = arc tan( a ) 0 § ‡ ax ∫ 0 1 − y2 e 2 dy , for x ≥ 0, (1) elsewhere. Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali Università degli Studi di Milano Bicocca. Italy – Piazza dell’Ateneo Nuovo, 20126 MILANO (email: [email protected]). Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali Università degli Studi di Milano Bicocca. Italy – Piazza dell’Ateneo Nuovo, 20126 MILANO (email: [email protected]). – 1 – 1 A. Pollastri, F. Tornaghi. In Fig.1 we show the d.f. for a = 0.2, 1, 5, 10, 15. This distribution has been utilised to obtain the simultaneous confidence regions of the probabilities of a trinomial distribution (dn.) (Zenga, Fedrizzi, 1981), of a quadrinomial dn. (Brunazzo, 1979; Brunazzo, Fedrizzi, 1980), of the differences between the probabilities of a trinomial dn. (Pollastri, 1980), of the marginal probabilities of a 2x2 table (Pollastri, 1979) and of the transition probabilities in a Markov Chain (Pollastri, 1982). The aim of this paper is to study the skewness and the relations between the Arctangent random variable (r.v.) with the Folded Standard Normal and the Skew-Normal r.v.. Then we will show that the Arctangent distribution is useful in providing confidence regions, or corresponding tests of hypotheses, for the two means of a Bivariate Correlated Normal. 2. Characteristics of the Arctangent distribution1 The Cumulative Distribution Function (c. d. f.) of the Arctangent r.v. is given by F (h; a ) = 1 − (2π ) T (h , a ) arc tan(a ) (2) where 1 2 1 2 1 + ∞ − 2 x ax − 2 y T (h, a ) = dxdy ∫ e ∫e 2π h 0 (3) is a function tabulated by Owen (1956) helpful in finding probabilities over regions of a Bivariate Normal Distribution. In Fig.2 we show the c.d.f. for a = 0.2, 1, 5,10,15. The c.d.f. of the Arctangent r.v. with parameter a is stochastically larger than the Arctangent r.v. with parameter a' where a < a' , that is F(h;a)<F(h;a’) (see Zenga, 1979). ∀h ∈ R + (4) The r th moment of the Arctangent r. v. ( Zenga, 1979 ) is 1 The present paper is financially supported by MURST. The title of the project, directed by Prof. Michele Zenga, is: Modelli Distributivi per fenomeni SocioEconomici”. Section 2 is due to Tornaghi. Section 3 is due to Pollastri. Some properties of the Arctangent Distribution ( ) E Xr = 1 θ ∞ ax ∫ ∫ 0 xr e − ( 1 2 2 x +y 2 ) dydx = 0 θ r r e 2 Γ + 1 ∫ cos r ( t )dt θ 2 0 1 where θ = arc tan( a ) In particular, we have E( X ) = π sin(θ ) 2 θ ( ) E X3 = E( X 2 ) = 1 + sin(θ ) cos(θ ) θ π sin (θ ) cos 2 (θ ) sin (θ ) +2 θ θ 2 The index of skewness γ 1 takes the form: γ1 = µ3 = σ3 π sin 2 (θ ) sin (θ ) (π − θ 2 ) − 3cos (θ ) 2 θ θ 2 3 sin(θ )cos (θ ) π sin (θ ) 2 (1 + ) − θ 2 θ2 (5) The index γ 1 as a function of a is reported in Fig. 3. The index of skewness increases when the parameter a increases until a≈10.26 and then decreases. It is possible to demonstrate that: lim γ 1 = 0.6311 lim γ 1 = 0.9953 a →0 a →∞ The asymmetry of the Arctangent r.v. is then analyzed with the help of computer. First of all, we consider the index proposed by Bowley in 1901 (Brentari,1990) w(0.25)= Q3 + Q1 − 2Me Q3 − Q1 (6) A. Pollastri, F. Tornaghi. where Me is the median, Q1 and Q3 are respectively the first and the third quartile. The index w(0.25) varies in the interval [-1,1] and so it is easily interpetable. Observing the diagram of w(0.25) reported in Fig.4, we note that the index increases until a≈5.452. Then we use a more analytic index based on the asymmetry of points suggested by David F.N. and Johnson in 1956 (see, f.i., Brentari,1990); for a continuous r.v. it is defined as w(p) = x(1 − p ) + x( p ) − 2 Me x(1 − p ) − x( p ) 0≤ p< 1 2 (7) where x( p ) = F −1 ( p ) is the pth percentile. The index w(p) is the sum of the distance of the (1-p)th percentile from the median and the distance of the median from the pth percentile divided by the distance from the (1 -p)th percentile and the pth percentile. So, having -1 ≤ w( p ) ≤ 1 , the index w( p) is normalized. Particular functions w( p ) of the Arctangent r.v. with parameter a=0.2, 1, 5, 10, 15 are given in Fig. 5. For small value of a, if a ' < a ' ' wa ' ( p ) < w a ' ' ( p ) 0≤ p< 1 2 For some couples of values of a, the relation above is not always true; for instance, if we take a’=5 and a’’=10 we can observe that wa ' ( p ) < wa '' ( p ) until p≈0.172 and then wa ' ( p ) > wa '' ( p ) . That is, the skewness for the Arctangent distribution of parameter a’ may be greater in some part of the distribution and in other part inferior with respect to the skewness of the Arctangent of parameter a” . This fact is shown in Fig. 6. Some properties of the Arctangent Distribution 0.9 a=15.00 0.8 a=10.00 a=5.00 0.7 a=1.00 0.6 0.5 0.4 0.3 a=0.2 0.2 0.1 0 0 1 2 3 4 5 6 Figura 1. Density functions of the Arctangent for a=0.2; 1; 5; 10 ;15 1.2 1 0.8 a=5 0.6 a=1 a=15 0.4 a=0.20 0.2 a=10 0 0 0.5 1 1.5 2 2.5 3 Figura 2. Cumulative Distribution Functions for a=0.2; 1; 5; 10 ;15 3.5 A. Pollastri, F. Tornaghi. 1.2 1 0.8 0.6 0.4 0.2 0 0 2 4 Figura 3: The index 6 γ1 8 10 12 14 16 18 20 12.00 14.00 16.00 18.00 20.00 as a function of a 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.09 0.08 0.07 0.06 0.00 2.00 4.00 6.00 8.00 10.00 Figura 4: Diagram of w(0,25) as a function of a Some properties of the Arctangent Distribution 0.5 0.45 a 10 a =15 0.4 0.35 a =5 0.3 0.25 a =1 0 01002 0.2 a =0,20 0 01 002 0.15 0.1 0.05 0 0.000 0.100 0.200 0.300 0.400 0.500 0.600 Figura 5: Diagram of w(p) for a=0.2; 1; 5; 10 ;15 0.45 a=15 0.4 a=10 a=5 0.35 0.3 0.25 0.2 0.15 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Figura 6. Diagram of w(p) ( 0.02<p<0.2) for a= 5; 10 ;15 It is interesting to note the extreme case of the Arctangent distribution g ( x; a ) , that is the function limit, if a → +∞. THEOREM 1. The Arctangent distribution having parameter a for a → +∞ tends to the Folded Standard Normal distribution. A. Pollastri, F. Tornaghi. PROOF: We denote with ϕ ( x ) and with Φ (⋅) respectively the d.f. and the c.d.f. of the Standard Normal r.v. . +∞ ∫e Given that − y2 2 2π , we can write 2 dy = 0 lim g ( x; a ) = lim a → +∞ a → +∞ e − x 2 ax 2 ∫e − y2 2 dy 0 arc tan(a ) e − x 2 +∞ y2 − 2 2 = ∫e dy 0 π = 2ϕ ( x ) x≥0 2 Observation For a Skew-normal r.v. of parameter λ having d.f. f ( z; λ ) = 2ϕ ( z )Φ(λz ) Azzalini (1985) obtained a similar result: lim f ( z ; λ ) = 2ϕ ( z ) λ →+∞ 3. Relations with other distributions THEOREM 2. Let us suppose that the r.v. (X,Y) has Standardized Bivariate Normal (SBN) Distribution, that is 0 1 X ~ N ; Y 0 ρ { } a1 and The d.f. of the r.v. T = max X , Y densities with parameters ρ 1 is a mixture of two Arctangent a2 and with proportions Some properties of the Arctangent Distribution π1 = a1 = 2 π arc tan(a1 ) and π2 = 2 π arc tan(a 2 ) respectively, where 1+ ρ 1− ρ and a 2 = . 1− ρ 1+ ρ PROOF.: The r.v. T has c.d.f. given by FT (t ) = P{T ≤ t} = P{ X ≤ t , Y ≤ t} = P{− t ≤ X ≤ t ,−t ≤ Y ≤ t} Note that FT (t ) corresponds to the integration of the d.f. of the S.B.N. over a square region. Using the formula indicated by B (h, k ; ρ ) and introduced by Owen (1956) for finding volumes over rectangles in a SBN over the lower-hand quadrant divided at x=h and y=k we can write FT ( t ) = B(t , t ; ρ ) − B(− t , t ; ρ ) − B(t ,−t ; ρ ) + B(− t ,−t ; ρ ) where: 1 Φ (h ) − T h, k − ρh + 1 Φ (k ) − T k , h − ρk k 1− ρ 2 h 1− ρ 2 2 2 if hk > 0 or if hk = 0 or k ≥ 0 B(h, k ; ρ ) = 1 h, k − ρh + 1 Φ (k ) − T k , h − ρk − 1 ( ) h T Φ − h 1− ρ 2 2 k 1− ρ 2 2 2 if hk < 0 or if hk = 0, h or k > 0 By applying algebra and T (h ,− a ) = T (h , a ) , we find FT (t ) = 1 − 4{T (t , a1 ) + T (t , a 2 )} From (2.1) we can write T (h , a ) = [1 − F ( h; a )]arc tan( a ) 2π . A. Pollastri, F. Tornaghi. The c.d.f. of the r.v. T is FT (t ) = 2 π {F (x; a1 )arc tan(a1 ) + F (x; a 2 )arc tan( a 2 )} By differentiating FT (t ) , the d.f. of the r.v. T is immediately obtained: f T (t ) = g (t ; a1 ) arc tan(a1 ) arc tan(a 2 ) + g (t ; a 2 ) π 2 π 2 Bearing in mind the relation arc tan(a ) + arc tan(1 a ) = π a>0 2 1 , the sum of the proportions is: a1 and observing that a 2 = 2 π arctan(a1 ) + COROLLARY. If ρ = 0 , the r.v. T = max X , Y { 2 π arctan(a 2 ) = 1 q.e.d. } has Arctangent density with parameter a=1. REMARK 1. Let us select a simple random sample of s=2 elements from the Folded Standardized Normal r.v. T = X , where X∼N(0,1). The Folded Standardized Normal d. f. is h(t ) = ϕ (t ) Φ (0) = 2 π e 1 − t2 2 t ≥ 0. The observations are denoted by t(1) and t(2) where t (1) ≤ t (2 ) . If we consider the absolute value of Y = {max(Ti ), i = 1,2} , the relative d. f. is given by T(2 ) , indicated by Some properties of the Arctangent Distribution f (y) = 2 where c= π 4 2 π e y 1 − y2 2 ∫ 0 2 π e 1 − t2 2 e y 1 − y2 2 dt = ∫e 0 c −t 2 2 dt y ≥0 = arc tan(1) . The meaning of this result is that the Arctangent r.v. with parameter a=1 corresponds to the maximum of the absolute value of a sample for s=2 from Standardized Normal r.v.. REMARK 2. Theorem 2 has been proposed with a different proof by Pollastri (1979). It has been used for the asymptotic confidence regions for the marginal probabilities p1. and p .1 in a Bivariate Binomial. It can be useful in finding simultaneous confidence regions or simultaneous test procedures for the means of a Bivariate Correlated Normal. Confidence region for the means of a Bivariate correlated Normal Tukey in 1953 (see Miller, 1981) proposed a confidence region based on Maximum modulus, that is { } P max[ Y1 , Y2 ] ≤ c = 1 − α If (1- α )=0,95, when Y1 and Y2 are independent, c=2,236 (see Miller (1981), page 14). This corresponds to the 95th percentile of the r.v. T when ρ = 0 (Pollastri,1979). When Y1 and Y2 are dependent, the Bonferroni inequality is used in order to find a region with a probability greater than (1- α ) fixed. The critical point is given by ± 2,241 . Using the exact distribution of T we can find, for instance, that the value c is ± 2,1799 if ρ = 0,7 or ± 2,1081 when ρ = 0,9 .The improvement is really considerable. A. Pollastri, F. Tornaghi. REMARK 3. Another application is suggested by Jang (1997) and Loperfido (2002). Let us consider the hypothesis H0: µ = 0 against H1: µ ≠ 0. Researcher who applies two statistics to the data and then chooses the most significant one uses a level α * greater than the α fixed. Let us suppose the investigator uses the statistics T1 and T2 . If T1 and T2 tend to the Normal, when the sample size increases, then the real level is given by α * = lim P[max( T1 , T2 ) > tα ] n →∞ * The level α may be computed through the Arctangent distribution if T1 and T2 are independent or through a mixture of two Arctangent distributions if the correlation coefficient between T1 and T2 is equal to ρ ≠ 0. Loperfido (2002) shows that the distribution of the random variables max (X,Y) and min (X,Y) are Skew-Normal. He uses this interesting result in order to find the real level α when the researcher uses two statistics and reports only the most significant one in testing H0: µ = 0 against H1: µ > 0. THEOREM 3. Let us suppose that the r.v. (X,Y) has SBN Distribution. The d.f. of the r.v. V = min X , Y is a function of Folded Standard { } Normal densities and of a mixture of two Arctangent densities. PROOF: P (V > x) = = P[( X > x, Y > x) ∪ (Y > x, X < − x) ∪ ( X < − x, Y > x) ∪ ( X < − x, Y < − x)] = 1 − 2Φ( x) + 2Φ(− x) + B( x, x; ρ ) + B(− x,− x; ρ ) − B( x,− x; ρ ) − B(− x, x; ρ ) = 3 − 4Φ ( x) + FT ( x) Then the c.d.f. of the r.v. V is FV ( x) = 4Φ ( x) − FT ( x) − 2 Some properties of the Arctangent Distribution and the d.f. is fV ( x ) = 2(2ϕ ( x )) − fT ( x ) . q.e.d. REMARK 4. The researcher who applies two statistics to the data and then chooses the least significant one uses a level α ’ smaller than the α fixed. Let us suppose the investigator uses the statistics T1 and T2 . If T1 and T2 tend to the Normal as n → ∞ then the real level is given by α ' = lim P[min( T1 , T2 ) < vα ] n →∞ THEOREM 4. The Skew-Normal truncated at c (c ≥ 0) is a mixture of a Standardized Normal truncated at c and of an Arctangent r.v. folded at c with weights respectively equal to 2T (c , a ) 1 − Φ (c ) and where S (c; a ) is the c.d.f. 1 − S (c; a ) 1 − S (c; a ) of a Skew-Normal of parameter a. PROOF: The c.d.f. of a Skew-Normal of parameter a (Azzalini,1985) is S (x; a ) = Φ ( x ) − 2T ( x, a ) The d.f. of a Skew-Normal truncated at c is l ( x; a, c ) = 2 = 2π 2ϕ ( x )Φ (ax ) 1 − S (c; a ) 1 ax − y 2 1 + 1 ∫ e 2 dy 2 2π 0 1 − Φ(c ) + 2T (c, a ) 1 − x2 e 2 A. Pollastri, F. Tornaghi. = [1 − Φ(c )] ϕ (x ) + 2T (c, a ) 1 − Φ (c ) 1 − Φ(c ) + 2T (c, a ) g ( x, a ) 2T (c, a ) q.e.d. The sum of the weights is equal to 1. COROLLARY. The Folded Skew – Normal r.v. is a mixture of a Folded Standardized arc tan(a ) 1 π Normal and of an Arctangent r.v. with weights equal to 2 and b b 1 arc tan(a ) respectively, where b = + . π 2 PROOF: The Skew – Normal r. v. truncated at 0 has d. f.: p ( x; a ) = 2ϕ ( x )Φ (ax ) +∞ 2 ∫ ϕ (t )Φ (at )dt 0 The denominator is equal to b. In fact ∞ 2 ∫ ϕ (t )Φ (at )dt 0 = 1 − S (0; a ) = 1 − Φ (0 ) + 2T (0 , a ) = 1 arctan (a ) + = b. 2 π So we can write 2 p(x; a ) = . 2π e 1 − x2 2 1 1 + 2π 2 b ax ∫e 0 1 − y2 2 dy 1 h( x ) arc tan( a ) g ( x; a ) = 2 π + b b Some properties of the Arctangent Distribution Note that the sum of the weights is equal to 1. 4. Conclusions The Arctangent distribution has been used to give confidence regions for the probabilities of the multinomial distribution shorter than the intervals commonly used. In the present paper we have shown further characteristics of this distribution, in particular we have studied the asymmetry of it. We have analysed the distribution of the absolute maximum and minimum of the component of a correlated Standard Bivariate Normal. We have shown that the results obtained may be very useful in multiple comparison. Then we have analysed the relations with other distributions and, in particular, with the Folded Standard Normal and with the Folded Skew Normal. References Azzalini A. (1985), A Class of Distributions which includes the Normal Ones, Scand. J. Statist., 12, pp. 171-178. Azzalini A. (1986), Further results on a class of distributions which includes the normal ones, Statistica, anno XLVI, n° 2, pp. 199-208. Brentari E. (1990), Asimmetria e misure di Asimmetria, Giappichelli Ed., Torino. Brunazzo A. (1979), Distribuzione campionaria asintotica dello scarto standardizzato assoluto massimo di una quadrinomiale, Quaderni di Statistica e Matematica Applicata alle Scienze Economico-Sociali, Vol. 2, nn. 1-2, 17-31. Brunazzo A., Fedrizzi M. (1980), Ancora sulla distribuzione campionaria asintotica dello scarto standardizzato assoluto massimo di una quadrinomiale ( caso non simmetrico ), Quaderni di Statistica e Matematica applicate alle Scienze Economico – Sociali, Vol. 3, pp. 316. Jiang J. (1997), Sharp upper and lower bounds for asymptotic levels of some statistical tests, Statistics & Probability Letters, Vol. 35, pp.395-400. Liseo B. (1990), La classe delle densità Normali Sghembe : Aspetti Inferenziali da un punto di vista Bayesiano, Statistica, anno 50, n° 1, pp. 71-79. A. Pollastri, F. Tornaghi. Loperfido N. (2002), Statistical implications of selectively reported inferential result, Statistics and probability letters, n° 56, pp. 13-22. Miller R.G.,Jr. (1981), Simultaneuos Statistical Inference, SpringerVerlag , Heidelberg and Berlin, second edition. Owen D. B. (1956), Tables for computing bivariate normal probabilities. Annals of Mathematical Statistics, Vol. 27, 1075-1090. Pollastri A. (1979), Intervalli di confidenza simultanei asintotici per le probabilità marginali in una tabella 2x2, Quaderni di Statistica e Matematica applicata alle Scienze Economico – Sociali. Pollastri A. (1982), Intervalli di confidenza simultanei asintotici per le probabilità di transizione in una catena di Markov, Quaderni di Statistica e Matematica applicata alle scienze Economico – Sociali, Vol II n 1-2. Pollastri A. (1980), Intervalli di confidenza simultanei asintotici delle differenze fra probabilità in una trinomiale, Atti della XXX Riunione Scientifica della Società Italiana di Statistica. Zenga M. (1979), L’impiego della funzione arcotangente incompleta nello studio della distribuzione asintotica dello scarto standardizzato assoluto massimo di una trinomiale, Statistica, XXXIX, n° 2, pp. 269286. Zenga M., Fedrizzi M. (1981), The Arctangent distribution and the simultaneous confidence intervals for trinomial proportions, Statistica, XLI, n° 3, pp. 411-419.