Coefficients of Association Analogous to Pearson's r for nonparametric Statistics

The rz and rp coefficients of association are discussed. Both coefficients, like Pearson's r, are based on a z/z max framework. They yield coefficients directly comparable for all levels of measurement being based on an obtained/maximum departure from independence in z units interpretation. The r z coefficient can be applied to any nonparametric test statistic in which a normal approximation equation is appropriate. The rp coefficient is applicable to any nonparametric test statistic in which exact probabilities are known.

The r z and r p coefficients of association are discussed. Both coefficients, like Pearson's r, are based on a z/z max framework. They yield coefficients directly comparable for all levels of measurement being based on an obtained/maximum departure from independence in z units interpretation. The r z coefficient can be applied to any nonparametric test statistic in which a normal approximation equation is appropriate. The r p coefficient is applicable to any nonparametric test statistic in which exact probabilities are known. THIS paper presents two coefficients of association that can be applied to most nonparametric test statistics. The coefficients are analogous to Pearson's r, being based on the ratio of the observed to the maximum possible departure from independence. Most nonparametric significance tests do not have an accompanying coefficient of association. Once the existence of a relationship is supported by a significance test, then a coefficient of association can be used to measure the strength of the relationship. A Proportional Interpretation of Pearson's r Pearson's r is a proportional measure that has a z/~m~ interpretation for the bivariate normal case. Pearson's r can be defined as the ratio of the observed r to the maximum possible r, that is, r/rm8x, where rmax = 1.0. For a bivariate normal population with p = 0.0, the z-test significance is the ratio of the observed sample r to the standard error of r, z = /-/0p, and thus, r = zaP. By substitution, Under the condition, p = 0.0, the a, = 1/~/~. The maximum possible value of z, z~. = rmax/op = 1/(1/~V) = .../N. On Figure 1, N = 100, and hence Zmax = 10. With a sample size of 100, the maximum number of standard deviation units (z-scores) which a correlation can depart from independence is 10 (C on Figure  1 ). If the observed z value is three standard deviation units from independence (B on Figure 1), then r = 3/10 = .30. The observed dependence measured in standard deviation units is 30 percent of the maximum possible dependence. Similarly, r2 = ZZ/Zm~2 = 9/ 100 = .09. The observed departure from independence is 9% of the maximum departure, measured in variance units (i.e., z2 units).
Many statisticians prefer r' to r because variances are additive, whereas standard deviations are not. According to the conventional approach, r equals the proportion of variances explained which does not apply to r. Using the interpretation presented in this paper, both r and r' represent the proportion of observed to maximum possible dependence-r in standard deviation units, r' in variance units. Figure 1 illustrates a common property of inferential statistics and correlation, namely, departure from independence. Inferential statistics, like the normal deviate z, compare the obtained departure from independence in standard deviation units (AB on Figure 1) to zero departure from independence (A on Figure 1). On the other hand, r is the ratio of the obtained departures, AB, to the maximum possible departures, AC, from independence in standard deviation units.

The rz Coefficient of Association
A nonparametric analog to Equation (1) is where z and zmax can be determined using a normal approximation equation. The t is the random variable upon which the nonparametric statistic is based, i.e., U for the Mann-Whitney test, T for the Wilcoxon test, r for the Wald-Wolfowitz runs test and one-sample runs test, x for the sign test, and ~d; for the randomization test for matched pairs (Siegel, 1956;Hays, 1973). Like Pearson's r, the E(r,) = 0 and the maximum rz = 1.00.
To give an example, the normal approximation can be applied to the Mann-Whitney U test when N >_ 8 (Hays, 1973). Siegel (1956) described a cross-cultural study of 39 nonliterate societies, undertaken by Whiting and Child (1953). The cases were dichotomized into n, = 16 societies where oral explanation of illness was absent and n2 = 23 societies where it was present. The societies were ranked from 1 (lowest) to 39 (highest), in terms of the degree to which the socialization of oral drives produced anxiety. Judgment of oral socialization anxiety were based on the rapidity, severity, and frequency of punishment, typical in oral socialization, and the severity of emotional conflict evidenced by children during the period of oral socialization. For the sum of ranks for the 16 societies, where oral explanation was absent, R, = 200, and for the sum of ranks where oral explanation was present, R2 = 580. The Mann-Whitney U = 304, the z = 3.429, and the one-tail p = .0003 (Siegel, 1956;122-123), indicating that oral socialization anxiety tends to be higher in societies where oral explanation of illness is present.
According to equation (2) Thus 65.2% of the maximum possible departure from independence is obtained in z standard deviation units and 42.5% is obtained in Z2 variance units. The obtained U exceeds the expected value of U by 65.2% of the maximum possible difference.
The normal deviate z-scores and the corresponding probability level are a function of the strength of the relationship and the size of the sample. Statistical significance is almost guaranteed if a large enough sample is used. On the other hand, the r, coefficient norms for sample size. For instance, suppose N was increased ten-fold so that n, = 160 and n2 = 230 and the R,/RZ ratio of 200/580 = .3448 remained constant. In this case, rz = (30,130 -18,400)/(36,800 -18,400) = .638, which is near the original rz = .652. The z/zmax = 10.71/16.80 = .638.
If n, and n2 were increased a hundred-fold, the rz = .636. Thus, the rz coefficients allow one to measure the strength of association by norming for sample size. This makes it possible to compare probabilities based on different sized samples.
The power efficiency of the Mann-Whitney U test is 95.5% of Student's t test, for large samples and near 95% for moderate-sized samples (Mood, 1954). This power efficiency is nearly the same for the z (the numerator) and Zrnax (the denominator) of Equation (2), since the numerator and denominator use the same N. Thus, Equation (2) norms for the different power levels of each nonparametric statistic.
The rz coefficient can also be applied to situations where only a single variable is involved. This includes the sign test, single-sample proportions tests, and the one-sample runs test. For these statistics, the r1 coefficient is not, strictly speaking, a measure of association since association requires two or more variables. However, the z/Zmax format and interpretation can be applied to these statistics.

X2Based Tests
A number of nonparametric test statistics use ~, including contingency tables, the Cochran Q test, the Friedman two-way analysis of variance by ranks test, the Kolmogorov-Smimov two-sample test, the Kruskal-Wallis one-way analysis of variance by ranks test, and the McNemar test for significance of changes. With the exception of contingency tables, these nonparametric tests do not have an accompanying coefficient of association for measuring the strength of a relationship.
In order to apply rz = z/ Zrnax to ~ based statistics, it is necessary to use a ~-to normal deviate z transformation. It has been shown by Acock and Stavig (1976) that Wilson and Hilferty's (1931) cube root transformation gives accurate estimates of the normal deviate z for v ? 2 (where v is the degrees of freedom). For v = 1, one should use z = R, in which case, rz is equal to the ~ coefficient for contingency tables. The For example, suppose the Friedman two-way analysis of variance test is applied to rank data on a 4 x 10 table with k -1 = 3 degrees of freedom. If the obtained sum of ranks in the jth column, L R;2 = 2750, the Friedman ~2 = 15. The maximum possible value of E Rj2= (40)2 + (30)2 + (20)2 + ( 10)2 = 3000. The X, m_2 = (the number of rows) x (the number of columns minus one) = (10)(3) = 30 (Acock and Stavig, 1979). The z = 2.882, Zmax = 4.515, and /-, = .638. Thus, 63.8% of the maximum possible departure from independence is obtained in z standard deviation units and 40.7% in Z2 variance units.
In all cases, the rz coefficient equals one when the obtained X2 attains its maximum possible value and zero when the obtained X2 = E(~). It is negative only when the observed X2 is less than what is expected by chance. In such a case, there is no reason to compute rz. This property is shared by the intraclass correlation r, (Haggard, 1958); omega square, úJ2 (Hays, 1973); shrunken multiple, lt2 (Olkin and Pratt, 1958); and the K coefficient of agreement (Cohen, 1960). Each of these statistics equal zero when the observed value equals the expected value and are negative when the observed value is less than what is expected by chance.

The rp Coefficient of Association for Small Samples
The normal approximation transformation used by the nonparametric test statistics are generally not considered to be appropriate for small samples. An alternative approach with small samples is to apply a direct probability to normal deviate transformation, /? &horbar;~ z. This procedure requires that significance level probabilities of the observed result and maximum possible result be determined. The two significance level probabilities are converted to the appropriate zscores, based on the area under the normal curve. The resulting nonparametric coefficient of association is where P(t) is the exact probability of the observed statistic t, and P(t-a,,) is the probability of the maximum possible t. The t represents for ~ based statistics. The rp coefficient has an upper limit of one. Applying Equation (6) to a hypothetical example for Wilcoxon's T, where N = 8, 7~ = 36, and T = 32, the The .027 and .004 probabilities are obtained from Kraft and Van Ee-den (1968). The rz = (32 -18)/(36 -18) = .778, obtained using Equation (2) only approximates the rP = .727 result, because Wilcoxon's T is not exactly normally distributed when N = 8. That is, the coefficients differ because the same probability levels, P = .027 and p = .004, respectively, correspond to a different magnitude of departure in standard deviation units, using the normal approximation from the Wilcoxon T compared to the normal distribution, per se. Identical results are obtained if one uses a one-or two-tailed test for both the numerator and denominator of Equation (6). The direct p ~ z transformation assumes that a particular level of probability, say, p = .010, is equivalent for any shaped distribution. Therefore, if on a normal distribution the obtained p = .027 and the maximum possible departure is p = .004, then Pearson's r = .727 (given 7;fo: p = 0.0)&horbar;the exact value of rr However, the rp measure is not identical to Pearson's r, since it does not utilize interval level information.
In summary, the rz and rp coefficients of association can be used with most nonparametric text statistics. Both measures are based on a z/zmax framework which yields a coefficient comparable for all levels of measurement. The coefficients are interpreted in terms of departure from independence, a common denominator of probability statistics, and correlational measures.