Документ взят из кэша поисковой машины. Адрес оригинального документа : http://ecology.genebee.msu.ru/3_SOTR/CV_Terekhin_publ/2007_Bonferroni_JOB.doc
Дата изменения: Mon Mar 16 11:08:06 2009
Дата индексирования: Mon Oct 1 21:37:01 2012
Кодировка: koi8-r


УДК 57.087.2:519.233.3

On the power of some binomial modifications of the Bonferroni multiple
test

ї2007 A. T. TERIOKHIN, T. DE MEEшS, J.-F. GUиGAN

Genetique et Evolution des Maladies Infecrieuses, UMR 2724 IRD-CNRS, Centre
IRD

911 Avenue Agropolis, BP 64501, 34394 Montpellier cedex 5, France

Faculty of Biology, Moscow Lomonosov State University

Leninskie Gory 1, Moscow 119992, Russia

e-mail: terekhin_a@mail.ru



Widely used in testing statistical hypotheses, the Bonferroni multiple
test has a rather low power that entails a high risk to accept falsely the
overall null hypothesis and therefore to not detect really existing
effects. We suggest that when the partial test statistics are statistically
independent, it is possible to reduce this risk by using binomial
modifications of the Bonferroni test. Instead of rejecting the null
hypothesis when at least one of n partial null hypotheses is rejected at
very high level of significance (say, 0.005 in the case of n=10), as it is
prescribed by the Bonferroni test, the binomial tests recommend to reject
the null hypothesis when at least k partial null hypotheses (say, k=[n/2])
are rejected at much lower level (up to 30-50%). We show that the power of
such binomial tests is essentially higher as compared with the power of the
original Bonferroni and some modified Bonferroni tests. In addition, such
an approach allows us to combine tests for which the results are known only
for a fixed significance level. The paper contains tables and a computer
program which allow to determine (retrieve from a table or to compute) the
necessary binomial test parameters, i.e., either the partial significance
level (when k is fixed) or the value of k (when the partial significance
level is fixed).







An environmental factor, [pic], can often influence independently
[pic] variables [pic] that describe the state of a population. For example,
the presence of a pollutant may increase the frequencies of several
diseases. Suppose that we know, in the form of [pic]-values[1], [pic], the
results of testing [pic] partial null hypotheses, [pic], postulating that
the factor [pic] has no effect on the variables [pic], correspondingly.
Then the problem arises how to combine these results to verify the overall
null hypothesis, [pic], that the factor has no effect. The simplest way is
to reject [pic] if at least one partial hypothesis is rejected at some
given level of significance, [pic], i.e., when the probability of the
partial mistake is not greater than [pic] (when the partial hypothesis
events are statistically independent). But this test procedure is
misleading because its significance level, [pic], may be much greater then
[pic]. For example, for [pic] and [pic] we would obtain an unacceptably
great value [pic].

To avoid such a high risk of rejecting falsely the overall null
hypothesis [pic], a number of procedures (multiple tests) were proposed
for combining the results of partial tests in such a way that the overall
significance [pic] be not greater than a given significance level, say,
[pic]. The most known multiple test is based on the Bonferroni inequality

[pic]

where [pic] is the significance level of each partial test (Morrison, 2004;
Couples et al., 1984; Meinert, 1986; Hochberg, Tamhane, 1987; Westfall,
Young, 1993; Bland, Altman, 1995). The inequality expresses a simple tenet
of probability theory: the probability that one of several events occurs
can not exceed the sum of probabilities of all those events. It follows
from this inequality that if we use for partial tests the significance
level [pic] (Bonferroni correction for multiplicity[2]) then the overall
significance [pic] will be not greater than required significance level
[pic].

However, the power (i.e., the probability of detecting the really
existing effect by rejecting the false null hypothesis) of such a
Bonferroni multiple test, as well as that of some of its modifications
(Holm, 1979; Simes, 1986; Hochberg, 1988; Rom, 1990; Zhang et al., 1997;
Roth, 1999), is rather low (Ryman, Jonde, 2001; Legendre P., Legendre L.,
1998; Morikawa et al., 1997; Blair et al., 1996). It is a reason why some
researchers (Rothman, 1990; Perneger, 1998; Bender, Lange, 1999)) suggest
rather to combine the results of partial tests at an informal level instead
to apply a multiplicity correction. Another way to improve the situation
and to rescue some empirically discovered effects falsely rejected by the
Bonferroni test is to use more powerful multiple tests.

The intuitive idea underlying our approach is that when the really
existing effect is expressed rather weakly but in all partial tests, the
power of Bonferroni test, [pic], equal to the probability of obtaining at
least one test with the significance level lesser than [pic] may be very
low and, on the contrary, in this case the probability that at least some
number [pic] ([pic]) of [pic] tests are significant at a level [pic]
(greater than [pic] or even greater than [pic]) may be much higher. Taking
[pic] and [pic] such that the overall probability [pic] is not greater than
the desired overall significance, [pic], and rejecting the null hypothesis
each time when there are at least [pic] tests significant at the level
[pic], one can obtain a multiple test with overall significance not greater
than [pic] and with the power greater than that of the Bonferroni multiple
test. It is natural to consider the multiple tests of this type as
binomial modifications of the Bonferroni test because the values of [pic]
and [pic] ensuring the desired overall significance [pic] can be easily
found (under the assumption of independence of partial tests) by


means of the well-known formula for binomial probabilities. We will see
that, indeed, the binomial multiple tests may have a power notably
exceeding the power of the Bonferroni test and its former modifications.

It is essential that we assume independence of partial tests to
construct the binomial tests. In practice, however, the partial tests may
be both independent (when they are based on different sets of data) and
less or more dependent (when they are based on the same set of data, say,
when performing multiple comparisons). We illustrate therefore the
consequences of attenuating the restriction of independence.

There are at least two principles for determining the values of
[pic] and [pic] for binomial multiple tests. First, we may fix arbitrarily
the value of [pic] and search for the largest value of [pic] that provides
for the chosen overall significance [pic] not greater than the required
value [pic]. We may set [pic]equal, say, to 2 and then calculate the
corresponding value of [pic]. Also we may, for any given [pic], set [pic]
equal to some fraction of [pic], for example, equal to [pic] (the integer
part of [pic]). Second, we may fix arbitrarily the value of [pic] and
search for the smallest value of [pic] which provides that the obtained
overall significance [pic] is not greater that the given level [pic], say,
[pic]. In particular, [pic] can be set equal to [pic], i.e. we can set
[pic] (Prugnolle et al., 2002) and then calculate the corresponding value
of [pic] which, evidently, depends on the required overall significance
[pic], on the chosen significance of partial tests [pic], and on the total
number of partial tests [pic]. But we may also fix [pic] at any other
level, say, at 0.10, 0.25 or even 0.50 and calculate the corresponding
value of [pic] for the chosen level of significance.

There are other modifications of the standard Bonferroni multiple
test, mainly based on the ranking of the partial [pic]-values. Holm (1979)
proposed a sequential multiple testing procedure (see also Rice, 1989). The
procedure consists in a stepwise comparison of successively increasing
partial [pic]-values, [pic], with successively greater partial significance
levels, [pic]. If [pic], then the overall null hypothesis [pic] is not
rejected and the procedure is stopped; otherwise, [pic] is rejected.
Inequality [pic]means also that the partial alternative should be rejected
and we may pass to the next comparison. This stepwise process continues
until the step [pic] where the inequality [pic] is fulfilled.

In fact, it is only the first step of this procedure, concerning the
overall null hypothesis, that is of interest for us. The binomial multiple
tests we will consider do not test partial hypotheses and, in this sense,
they give less information as compared with sequential tests. In principle,
this is not even necessary that the overall alternative hypothesis is
formulated as a falsity of all partial null hypotheses. But even in the
case when the alternative hypothesis is formulated as the falsity of only a
part of partial null hypotheses, it is very important to have a powerful
test for the overall null hypothesis because falsely accepting the overall
null hypothesis prevents automatically any further testing of partial
hypotheses.

Note also that Holm's procedure has the same power as the simple
Bonferroni test because its first step is the same as that in the
Bonferroni test. We will therefore use for comparison another sequential
modification of the Bonferroni test developed by Simes (1986) in which the
overall null hypothesis is rejected if at least one of inequalities [pic]
holds. Though the Simes procedure does not provide universally that its
really attained level of significance [pic] is always less than the
required significance level [pic], it does so for a wide class of
multivariate distributions, in particular, for the case of independent
partial test statistics (Simes, 1986; Hochberg, Rom, 1995; Samuel-Cahn,
1996).

Another approach to combining independent test results was proposed by
Fisher (Fisher, 1970; see also Manly, 1985). It is based on the fact that
if [pic] is true, then [pic] are uniformly randomly distributed over the
interval [0, 1] and, consequently, the statistic

[pic]

is approximately distributed as a chi-square random variable with [pic]
degrees of freedom, the greater [pic], the better the approximation.

Here we compare the power of different binomial multiple tests and
the power of Bonferroni, Simes and Fisher tests for three types of the
alternative hypothesis: (1) all partial null hypotheses are not true; (2)
about a half of partial null hypotheses are not true; (3) only one of
partial null hypotheses is not true. It will be shown that multiple
binomial tests, especially that with [pic], are very suitable for testing
the null hypothesis against alternatives (1) and (2), but not for (3).

Another problem with multiple tests is an eventual correlation between
partial tests. The parameters [pic] and [pic] of binomial tests are
calculated under assumption that the partial tests are independent, and
only in this case their really attained significance [pic] does not exceed
the required level [pic]. Unfortunately, this property does not remain
valid when partial tests are dependent, and we will see that, in this case,
the overall significance [pic] can become notably greater than the desired
level [pic], especially if intercorrelations between partial tests are
high. So, some corrections for non-independence are needed in such cases.
In Bonferroni test, for example, when partial tests are highly correlated,
it is proposed to calculate the partial significance by formula [pic]
(Tukey et al., 1985; see also Curtin, Schultz, 1998). We will also consider
how the lack of independence changes the properties of binomial tests.

METHODS

We consider the situation where the results of [pic] independent tests
for [pic] partial null hypotheses, [pic], are given in the form of their
sample tail probabilities ([pic]-values), [pic], and we wish to combine
these results in a single test procedure (multiple test) with a given
significance level [pic] for verifying the overall null hypothesis, [pic],
affirming that all partial null hypotheses, [pic], are true. One way to do
this is to reject [pic] each time when at least [pic] of [pic] [pic]-values
[pic] are less than some level [pic], where [pic] and [pic] are chosen in
such a way that the significance level of this procedure is not greater
than a given value [pic], say, [pic]. As was already noted, we can fix
[pic] and search for the greatest [pic] providing the desired level of
significance [pic], or fix [pic] and search for the least [pic] providing
the significance level [pic].

In the case of fixed [pic], the necessary value of [pic] depends on
[pic] and [pic] and can be calculated by means of the Bernoulli formula for
binomial probabilities. More precisely, to find the value of [pic] that
provides for the level of significance which is the most close to [pic],
it is sufficient to find the least value of [pic] for which the inequality

[pic]

is still satisfied. Note that the left-hand side of the inequality
increases with increasing [pic] and [pic] for small [pic].

In the case of fixed [pic], we use the same inequality but vary [pic]
instead of [pic]. To find the value of [pic] providing the level of
significance the most close to [pic], it is sufficient to find the greatest
value of [pic] for which this inequality still holds.

In the Appendix we give a computer program for calculating [pic] or
[pic] for any given values of [pic] and [pic].

To find the parameters [pic] or [pic] of the binomial multiple tests
for any [pic] and [pic], we use only the assumption of independence among
partial tests and do not need any assumption on the probability
distribution of the partial test statistics. However, to estimate the
power of these tests we need to know these distributions. Hence, we make
additional assumptions concerning the distributions of test statistics to
be able to compare the powers of different tests. Certainly, the
conclusions drawn from these particular comparisons cannot be general but
they could give sound guidelines for choosing a suitable multiple test in
real situations.

To compare the powers of different multiple tests, we use the
following partial tests which will be further referred to as "standard
partial tests". It is assumed that their test statistics [pic] have the
standard normal distribution, [pic], under the partial null hypotheses
[pic], and that they have the same distribution but shifted to the right by
a value [pic], i.e. [pic], under the partial alternative hypotheses [pic],
[pic]. Fig. 1 illustrates this testing situation graphically.

In each partial test we reject the null hypothesis [pic] (say,
"absence of effect") and accept the alternative hypothesis [pic] (say,
"presence of effect") if the sample test statistic falls into the critical
region [pic], where [pic] is the value of [pic] that satisfies the
equation [pic]. The power of this partial test is equal to [pic]. Fig. 1
illustrates the case when [pic], [pic], and [pic].

If the overall alternative hypothesis is formulated as the falsity of
all partial null hypotheses, the power of the multiple binomial can be
calculated by the Bernoulli formula

[pic],

where [pic].

If the alternative consists in falsity of only [pic] of [pic] partial
null hypotheses with [pic] equal, for example, to 1 or [pic], the power can
be calculated as

[pic],

where [pic], [pic], [pic], [pic].

Another method we used to evaluate the power of a test consists in
generating a large number [pic], say, [pic], of random values in accordance
with the probability distribution of the test statistics under the
alternative hypothesis and in applying the multiple test to this data. The
fraction of cases where the alternative hypothesis is accepted estimates
the power of the multiple test.




RESULTS

We have computed the parameters of some binomial tests for [pic] from
1 to 30 and [pic]using the program given in Appendix (Table 1).

To compare those tests, we have also computed their powers for the
case of standard partial tests with the alternative hypothesis that all the
partial null hypotheses are false (Table 2).

Figs. 2 and 3 illustrate how the power of these binomial multiple
tests varies with the number of standard partial tests [pic]. Fig. 2 does
sos for binomial tests with fixed [pic] and Fig. 3 for binomial tests with
fixed [pic].

Example. Ryman and Jorde (2001) tested the allele frequency
difference in 12 loci of two consecutive yearly classes of brown trout
(Salmo trutta) using, for each locus, the chi-square statistics computed on
the base of a [pic] contingency table. The following twelve [pic]-values
were obtained: 0.007; 0.611; 0.009; 0.228; 0.110; 0.097; 0.651; 0.053;
0.851; 0.651; 0.058; 0.743. The Bonferroni test fails to elicit any
significant difference between two classes at the level [pic] because there
is no [pic]-value lesser than [pic]among the 12 partial [pic]-values. The
authors argue for the use of the sum of partial chi-squares for testing the
overall null hypothesis of no difference and, indeed, they succeed to
discover a significant difference in allele frequency by this method.
Instead, we could use the [n/2]-binomial multiple test which is more
universally applicable than the sum of chi-squares. According to Table 1,
the null hypothesis should be rejected if at least 12/2=6 partial tests are
significant at the level [pic]. We find seven[pic]-values significant at
this level (0.007; 0.009; 0.228; 0.110; 0.097; 0.053; 0.058), whereby it
follows that the null hypothesis should be rejected.

In Table 3, we compare, for some values of [pic], the [n/2]-binomial
test not only with the Bonferroni, but also with the Simes (1986) and
Fisher (1970) tests mentioned in the Introduction. We see that the [n/2]-
binomial test is more powerful than the Simes test, but less powerful than
the Fisher test, especially for small [pic].

Until now we compared the powers of different tests under the
alternative hypothesis that all partial null hypotheses are false. However,
in practice, it is not always so. Sometimes the falsity of the overall null
hypothesis may mean that only few, even one, null partial hypotheses are
not true. In Table 4 we compare the power of the Bonferroni, Simes, Fisher
and [n/2]-binomial multiple tests for [pic] under two alternative
hypotheses when not all partial null hypotheses are false: (1) when a half
of [pic] partial null hypotheses are false, and (2) only one partial null
hypotheis is false. We see that, in the case of the alternative when a
half of partial null hypotheses are false, the conclusions concerning the
comparative properties of the four tests are nearly the same as in the case
where the alternative falsifies all the partial null hypotheses. However,
in the case of the alternative when only one partial null hypotheis is
false, the [n/2]-binomial has no advantages and even a slightly lower
power as compared to the Bonferroni, Simes and Fisher tests.

In the previous considerations we assumed that the partial tests are
independent. However, in practice it is not always so. To consider what
follows from the failure of this assumption, we compared, for several
values of [pic], the significance and power of different tests, under the
alternative hypothesis that all partial null hypotheses are false, in the
situation of correlated partial tests. For each [pic], we simulated 10,000
sets of [pic] values of the test statistic under the alternative
hypothesis which were correlated at the level of about 0.5. Then we applied
the Bonferroni, Simes and [n/2]-binomial multiple tests to these data. The
results are presented in Table 5: though the power of the [n/2]-binomial
test remains always much higher than that of the Bonferroni and Simes
tests, its overall significance, [pic], is greater than the required level,
[pic], especially for large [pic]. In this situation, the [n/2]-binomial
test behaves similar to the Fisher test: the significance levels of the
both tests become considerably greater than [pic]. Note, however, that the
correlation of partial tests affects weaker the [n/2]-binomial test: the
increase in the significance level and decrease in the power are lesser
than those for the Fisher test.

To the contrary, as it can be seen from Table 5, the significance
level of the Bonferroni and Simes tests is less than the required level
[pic]. We hence have corrected all four tests in such a way that their
overall significance be equal to [pic] (simply by directly adjusting them).
In particular, as it can be seen from Table 6, it was sufficient for this
to increase [pic] for the Bonferroni test by 20-30% (these corrections are
considerably less than those we would obtain using the equation
[pic]proposed by Tukey et al. (1985) for the case of correlated tests) and
for the Simes test by 10-20%, and to decrease [pic] for [n/2]-binomial test
nearly twice. We see that, after these corrections, the power of [n/2]-
binomial test still remains superior to the power of the Bonferrony and
Simes tests, especially for large [pic] (the power is about one half for
the [n/2]-binomial test versus one third for the Bonferroni and Simes
tests, for [pic] greater than 15) and close to (and even greater than, for
large [pic]) the power of the Fisher test.


DISCUSSION AND CONCLUSION

The findings show that multiple binomial tests have considerably
higher powers, as compared with those of the Bonferroni test and some its
sequential modifications, when the partial tests are independent and the
alternative hypothesis consists in the falsity of all partial null
hypotheses. Though the powers were computed for a particular tested
situation, we believe that this conclusion may be general enough. Hence, if
the assumptions of partial tests independence and of multiplicity in
partial null hypotheses violtion are true, it follows from our comparisons
that the binomial tests, and especially the [n/2]-binomial test, are more
powerful than the Bonferroni and Simes tests but less powerful than the
Fisher test. We do not, however, recommend to use unreservedly the Fisher
test. An obstacle in its using is the necessity to calculate logarithms of
partial [pic]-values which may often be equal to 0, especially when partial
tests are based on discrete statistics. We also see, that the Fisher test
is more sensible to correlations among the partial tests.

From the two kinds of multiple binomial test, the kind with fixed
[pic] and that with fixed [pic], the tests with fixed [pic] are more
preferable because, by virtue of the continuity of [pic] (as opposed to
the discreteness of [pic]), the dependence of the power on the number of
tests, [pic], is more regular and the power values are mostly higher. In
particular, the test with [pic]=[pic] behaves quite well and its power
attains almost the double of the power of the Bonferroni test for our
standard partial tests.

Though binomial tests with fixed [pic] are, in general, less
preferable, they may be useful in some cases. For example, if a single bit
of information we have about each partial test is whether it is significant
or not at the level [pic], then it is natural to combine these partial
tests by using the multiple binomial test with the partial significance
level [pic].

In practice, however, we are not always sure that our standard
assumptions about a tested situation (independence of partial tests and
simultaneous violation of all partial null hypotheses) are really true. In
particular, test statistics may be correlated. We saw that, in this case,
the Bonferroni test ([pic]) and [n/2]-binomial test behave quite
differently. The the values of significance and power of the Bonferroni
test decrease as compared with their values computed under the assumption
of independence, whereas the values of the significance and power of the
[n/2]-binomial and Fisher tests increase (especially those of the latter).
It means that, to provide the required level of significance with no loss
in the power, we should increase the value of the partial levels of
significance for the Bonferroni and Simes tests and decrease them for the
[n/2]-binomial and Fisher tests. In our example, with partial tests
correlated at the level of 0.5, we had to increase the value of the partial
significance level of the Bonferroni and Simes tests by about 25% and to
decrease the value of the partial significance level of the [n/2]-binomial
test by about 50%. After these adjustments, the [n/2]-binomial test
remained still more powerful than the Bonferroni and Simes tests, though
not so drastically as in the case of independence.

Another deviation from our basic assumptions may consist in only few
of partial null hypotheses are really false in the case of falsity of
overall null hypothesis. In the extreme case, only one of [pic] partial
hypotheses is corrupted, and we see that, in this case, it is rather the
Bonferroni or, better, Simes test that is preferable. But if the
corruptions of a partial null hypothesis are numerous, say, about a half
of all of them, the [n/2]-binomial and Fisher tests remain considerably
more powerful.

The authors are thankful to anonymous referees for many helpful
suggestions. The work was supported by a CNRS senior research fellowship
and a grant from RFBR (07-04-00521) for A.T.



References



Bender R., Lange S., 1999. What's wrong with arguments against multiplicity
adjustments // BMJ. V. 318(7183). P. 600.

Blair R.C., Troendle J.F., Beck R.W., 1996. Control of familywise error in
multiple endpoint assessments via stepwise permutation tests // Stat.
Med. V. 15(11). P. 1107-1121.

Bland J.M., Altman D.G., 1995, Multiple significance tests: the Bonferroni
method // BMJ. V. 310(6973). P. 170.

Bonferroni C. E., 1935. Il calcolo delle assicurazioni su gruppi di teste
// Studi in Onore del Professore Salvatore Ortu Carboni. Rome: Italy.
P. 13-60.

Couples A, Heeren T., Schatzkin A., Colton T., 1984. Multiple testing of
hypotheses in comparing two groups // Ann. Intern. Med. V. 100(1). P.
122-129.

Curtin F., Schultz P., 1998. Multiple Correlations and Bonferroni's
Correction // Biol. Psychiatry. V. 44(8). P. 775-777.

Fisher R.A., 1970. Statistical Methods for Research Workers. 14th ed.
London: Oliver and Boyd. 362 p.

Hochberg Y., 1988. A sharper Bonferroni procedure for multiple tests of
significance // Biometrika. V. 75(4). P. 800-812.

Hochberg Y., Rom D., 1995. Extensions of multiple testing procedures based
on Simes' test // J. Statist. Plan. Infer. V. 48(1). P. 141-152.

Hochberg Y., Tamhane A.C., 1987. Multiple Comparison Procedures. N. Y.:
Wiley. 950 p.

Holm S., 1979. A simple sequentially rejective multiple test procedure //
Scand. J. Stat. V. 6(1). P. 65-70.

Legendre P., Legendre L., 1998. Numerical Ecology, 2nd ed. Amsterdam:
Elsevier. 853 p.

Manly B.F.J., 1985. The Statistics of Natural Selection on Animal
Population. London: Chapman and Hall. 484 p.

Morikawa T., Terao A., Iwasaki M., 1997. Power evaluation of various
modified Bonferroni procedures by Monte Carlo study // J. Biopharm.
Stat. V. 7(3). P. 473-477.

Morrison D.F., 2004. Multivariate Statistical Methods, 4nd ed. N. Y.:
McGrraw-Hill. 480 p.

Meinert C.L., 1986. Clinical Trials. Design, Conduct and Analysis. N. Y.:
Oxford Univ. Press. 512 p.

Perneger T.V., 1998. What's wrong with Bonferroni adjustments // BMJ. V.
316(7139). P. 1236-1238.

Prugnolle F., de MeeШs T., Durand P., Sire C. ThИron, A., 2002. Sex-
specific genetic structure in Schistosoma mansoni: evolutionary and
epidemiological implications // Mol. Evol. V. 11(7). P. 1231-1238.

Rice W.R., 1989. Analysing tables of statistical tests // Evolution. V.
43(1). P. 223-225.

Rom D.M., 1990. A sequentially rejective test procedure based on a modified
Bonferroni inequality // Biometrika. V. 77(3). P. 663-665.

Roth A.J., 1999. Multiple comparisons procedures for discrete test
statistics // J. Stat. Plan. Infer. V. 82(1-2). P. 101-117.

Rothman K.J., 1990. No adjustment are needed for multiple comparisons //
Epidemiology. V. 1(1). P. 43-46.

Ryman N., Jorde P.E., 2001. Statistical power when testing for genetic
differentiation // Mol. Evol. V. 10(10). P. 2361-2373.

Samuel-Cahn E., 1996. Is the Simes improved Bonferroni procedure
conservative? // Biometrika. V. 77(3). P. 663-665.

Simes R.J., 1986. An improved Bonferroni procedure for multiple tests of
significance // Biometrika. V. 73(3). P. 751-754.

Tukey J.W., Ciminera J.L., Heyse J.F., 1985. Testing the statistical
incertainty of a response to increasing doses of a drog // Biometrics.
V. 41(1). P. 295-301.

Westfall P.H., Young S.S., 1993. Resampling-based Multiple Testing. N. Y.:
Wiley. 360 p.

Zhang J., Quan H., Ng J., Stepanavage M.E., 1997. Some statistical methods
for multiple endpoints in clinical trials // Contr. Clin. Trials. V.
18(3). P. 204-221.




Appendix

Below we give the program which can be used for computing the
parameters of binomial tests. It requires four input values: (1) the
number of partial tests, n; (2) the desired significance of the combined
test, alpha; (3) the desired significance of partial test, alpha1fix (we
set 0 instead if alpha1fix is not fixed); (4) the desired number of
significant partial tests necessary for rejecting the overall null
hypothesis, kfix (we set 0 instead if kfix is not fixed); The output
includes two values: the optimal significance of partial tests, alpha1opt,
and the optimal number of significant partial tests, kopt.

If kfix=0 and a1fix>0, then the program searches for the number of
significant partial tests, kopt, which provides the minimal difference
between the desired significance, alpha, and real significance, an, always
keeping an not greater than alpha. If kfix>0 and alpha1fix=0, then the
program searches for the value of partial significance, alpha1opt, which
provides the minimal difference between the desired and real significances,
always keeping an not greater than alpha.

We recommend to put kfix = [n/2] and alpha1fix = 0 (see details in the
text).



















The text of program in QuickBasic





DEFINT I-N

CLS

INPUT "Enter n: ", n

INPUT "Enter alpha: ", alpha

INPUT "Enter kfix: ", kfix

INPUT "Enter alpha1fix: ", alpha1fix

da = .0001



IF kfix = 0 THEN

k1 = 1

k2 = n

ELSE

k1 = kfix

k2 = kfix

END IF



IF alpha1fix = 0 THEN

a11 = da

a12 = .5

ELSE

a11 = alpha1fix

a12 = alpha1fix

END IF



dmin = 1

FOR k = k2 TO k1 STEP -1

FOR a1 = a11 TO a12 STEP da

an = 0



FOR i = k TO n

cni = 1



FOR j = 1 TO i

cni = cni * (n - j + 1) / j

NEXT j



an = an + cni * a1 ^ i * (1 - a1) ^ (n - i)

NEXT i



dn = alpha - an



IF dn >= 0 AND dn <= dmin THEN

kopt = k

alpha1opt = a1

dmin = dn

END IF



NEXT a1

NEXT k



PRINT

PRINT "kopt="; kopt

PRINT "alpha1opt=";

PRINT USING "##.####"; alpha1opt


Captions for figures

Fig. 1. Distribution of the test statistic in the "standard partial test"
(see text) under the null and alternative partial hypotheses.



Fig. 2. The power of some binomial multiple tests with fixed k as a
function of the number of standard partial tests n for [pic] under the
alternative hypothesis that all partial null hypotheses are false. The
power of the Bonferroni test is also given for comparison.

Fig. 3. The power of the some binomial multiple tests with fixed [pic] as a
function of the number of standard partial tests [pic] for [pic] under the
alternative hypothesis that all partial null hypotheses are false. The
power of the Bonferroni test is also given for comparison.










[pic]



Fig. 1.




[pic]



Fig. 2.



[pic]



Fig. 3.




Table 1. Parameters of some binomial multiple tests as a function of the
number of standard partial tests [pic] for [pic]

| |The required partial significance |The required number of |
| |level, [pic] |significant partial tests,|
|[p| |[pic] |
|ic| | |
|] | | |
| |[pic] |[pic] |[pic] |[pic] |[pic] |[pic] |[pic] |
| |(Bonferroni)| | | | | | |
|1 |0.0500 | | | |1 | | |
|2 |0.0253 |0.223 | |0.025 |2 |2 | |
|3 |0.0170 |0.135 |0.368 |0.016 |2 |2 |3 |
|4 |0.0127 |0.097 |0.248 |0.097 |2 |3 |4 |
|5 |0.0102 |0.076 |0.189 |0.076 |2 |3 |4 |
|6 |0.0085 |0.062 |0.153 |0.153 |2 |3 |4 |
|7 |0.0073 |0.053 |0.128 |0.128 |2 |3 |5 |
|8 |0.0064 |0.046 |0.111 |0.192 |3 |3 |5 |
|9 |0.0057 |0.041 |0.097 |0.168 |3 |4 |5 |
|10|0.0051 |0.036 |0.087 |0.222 |3 |4 |6 |
|11|0.0047 |0.033 |0.078 |0.199 |3 |4 |6 |
|12|0.0043 |0.030 |0.071 |0.245 |3 |4 |7 |
|13|0.0039 |0.028 |0.066 |0.223 |3 |4 |7 |
|14|0.0037 |0.025 |0.061 |0.263 |3 |4 |7 |
|15|0.0034 |0.024 |0.056 |0.243 |3 |5 |8 |
|16|0.0032 |0.022 |0.053 |0.278 |3 |5 |8 |
|17|0.0030 |0.021 |0.049 |0.260 |4 |5 |8 |
|18|0.0028 |0.020 |0.047 |0.291 |4 |5 |9 |
|19|0.0027 |0.019 |0.044 |0.273 |4 |5 |9 |
|20|0.0026 |0.018 |0.042 |0.301 |4 |5 |9 |
|21|0.0024 |0.017 |0.040 |0.285 |4 |6 |10 |
|22|0.0023 |0.016 |0.038 |0.311 |4 |6 |10 |
|23|0.0022 |0.015 |0.036 |0.296 |4 |6 |10 |
|24|0.0021 |0.015 |0.034 |0.319 |4 |6 |11 |
|25|0.0020 |0.014 |0.033 |0.305 |4 |6 |11 |
|26|0.0020 |0.013 |0.032 |0.326 |4 |6 |11 |
|27|0.0019 |0.013 |0.030 |0.313 |4 |6 |12 |
|28|0.0018 |0.012 |0.029 |0.333 |4 |7 |12 |
|29|0.0018 |0.012 |0.028 |0.320 |5 |7 |12 |
|30|0.0017 |0.011 |0.027 |0.338 |5 |7 |13 |
Table 2. The power of some binomial multiple tests as a function of the
number of standard partial tests n for [pic] under the alternative
hypothesis that all partial null hypotheses are false

| |Power |
|[pi| |
|c] | |
| |[pic] |[pic] |[pic] |[pic] |[pic] |[pic] |[pic] |
| |(Bonferroni)| | | | | | |
|1 |0.26 | | | |0.26 | | |
|2 |0.31 |0.35 | |0.31 |0.07 |0.15 | |
|3 |0.34 |0.44 |0.42 |0.33 |0.17 |0.34 |0.25 |
|4 |0.36 |0.49 |0.52 |0.49 |0.28 |0.17 |0.16 |
|5 |0.38 |0.54 |0.59 |0.54 |0.39 |0.30 |0.39 |
|6 |0.40 |0.57 |0.64 |0.64 |0.49 |0.43 |0.60 |
|7 |0.41 |0.60 |0.68 |0.68 |0.58 |0.56 |0.48 |
|8 |0.43 |0.62 |0.71 |0.74 |0.35 |0.66 |0.66 |
|9 |0.44 |0.65 |0.73 |0.77 |0.43 |0.49 |0.79 |
|10 |0.45 |0.66 |0.76 |0.82 |0.50 |0.59 |0.70 |
|11 |0.46 |0.68 |0.78 |0.85 |0.57 |0.68 |0.81 |
|12 |0.46 |0.69 |0.79 |0.88 |0.64 |0.75 |0.74 |
|13 |0.47 |0.71 |0.81 |0.89 |0.70 |0.81 |0.83 |
|14 |0.48 |0.71 |0.82 |0.92 |0.75 |0.86 |0.90 |
|15 |0.49 |0.73 |0.83 |0.93 |0.79 |0.76 |0.85 |
|16 |0.49 |0.74 |0.85 |0.94 |0.83 |0.81 |0.90 |
|17 |0.50 |0.75 |0.85 |0.95 |0.68 |0.85 |0.94 |
|18 |0.50 |0.76 |0.86 |0.96 |0.73 |0.89 |0.91 |
|19 |0.51 |0.77 |0.87 |0.97 |0.77 |0.92 |0.95 |
|20 |0.51 |0.78 |0.88 |0.97 |0.80 |0.94 |0.97 |
|21 |0.52 |0.78 |0.89 |0.98 |0.83 |0.89 |0.95 |
|22 |0.52 |0.79 |0.89 |0.98 |0.86 |0.91 |0.97 |
|23 |0.53 |0.79 |0.89 |0.99 |0.88 |0.93 |0.98 |
|24 |0.53 |0.81 |0.90 |0.99 |0.90 |0.95 |0.97 |
|25 |0.54 |0.80 |0.90 |0.99 |0.92 |0.96 |0.98 |
|26 |0.54 |0.80 |0.91 |0.99 |0.93 |0.97 |0.99 |
|27 |0.54 |0.81 |0.91 |0.99 |0.95 |0.98 |0.98 |
|28 |0.55 |0.81 |0.91 |1.00 |0.96 |0.96 |0.99 |
|29 |0.55 |0.82 |0.92 |1.00 |0.91 |0.97 |0.99 |
|30 |0.55 |0.81 |0.92 |1.00 |0.92 |0.98 |0.99 |



Table 3. The power of tests for [pic] under the alternative hypothesis that
all partial null hypotheses are false

| |Bonferroni|Simes |Fisher |Binomial, |
|[p| | | |[pic] |
|ic| | | | |
|] | | | | |
|4 |0.36 |0.39 |0.60 |0.49 |
|10|0.45 |0.48 |0.90 |0.82 |
|20|0.51 |0.57 |0.99 |0.97 |
|30|0.55 |0.61 |1.00 |1.00 |



Table 4. The power of tests for [pic] under two alternative hypothesis when
not all partial null hypotheses are false

| |A half of partial null hypotheses|Only one partial null hypothesis|
|[p|are false |is false |
|ic| | |
|] | | |
| |Bonferro|Simes |Fisher|Binomial|Bonferro|Simes |Fisher|Binomia|
| |ni | | |, |ni | | |l |
| | | | |[pic] | | | |[pic] |
|4 |0.23 |0.23 |0.29 |0.23 |0.14 |0.14 |0.15 |0.12 |
|10|0.28 |0.30 |0.51 |0.37 |0.10 |0.10 |0.11 |0.08 |
|20|0.33 |0.34 |0.75 |0.57 |0.08 |0.09 |0.08 |0.07 |
|30|0.34 |0.37 |0.88 |0.67 |0.07 |0.08 |0.08 |0.06 |



Table 5. The significance and power for [pic] under the alternative
hypothesis that all partial null hypotheses are false, for partial tests
correlated at the level of about 0.5

| |Bonferroni |Simes |Fisher |Binomial, [pic] |
|[p| | | | |
|ic| | | | |
|] | | | | |


|[pic] |[pic] |[pic] |[pic] |[pic] |[pic] |[pic] |[pic] |[pic] |[pic]
|[pic] | |4 |0.0148 |0.05 |0.30 |0.0136 |0.05 |0.31 |0.05 |0.34 |0.058
|0.05 |0.33 | |10 |0.0068 |0.05 |0.32 |0.0058 |0.05 |0.33 |0.05 |0.39
|0.091 |0.05 |0.37 | |20 |0.0032 |0.05 |0.30 |0.0028 |0.05 |0.32 |0.05
|0.39 |0.164 |0.05 |0.54 | |30 |0.0023 |0.05 |0.31 |0.0020 |0.05 |0.33
|0.05 |0.39 |0.158 |0.05 |0.52 | |


О мощности некоторых биномиальных модификаций множественного критерия
Бонферрони

а.т. терехин, т. де мееус, ж.-ф. геган.

Центр генетики и эволюции инфекционных болезней, ИРД, 911 авеню Агрополис

Монпелье 34394, Франция

Московский государственный университет им. М.В. Ломоносова

Биологический факультет, Ленинские горы 1, Москва 119992

E-mail: terekhin_a@mail.ru

Широко используемый при проверке статистических гипотез множественный
критерий Бонферрони имеет довольно низкую мощность, что делает высоким риск
ошибочного принятия общей нулевой гипотезы, т.е. необнаружения реально
существующего эффекта. Мы показываем, что если статистики частных критериев
независимы, то можно снизить этот риск, используя некоторые биномиальные
модификации критерия Бонферрони. Вместо непринятия нулевой гипотезы, когда
отвергается по меньшей мере одна из n частных нулевых гипотез на довольно
высоком уровне значимости (например, на уровне 0.005 в случае n=10), как
это предписывается критерием Бонферрони, предлагаются биномиальные тесты, в
которых нулевая гипотеза отвергается, когда отвергаются по меньшей мере k
частных нулевых гипотез (например, k=n/2) на гораздо менее высоком уровне
значимости (до 30-50%). Показывается, что мощность таких биномиальных
тестов существенно выше мощности теста Бонферрони и некоторых его
вариантов. Кроме того, такой подход позволяет объединять тестирования,
результаты которых известны лишь для фиксированного уровня значимости. В
статье приводятся таблицы и компьютерная программа, позволяющие найти (из
таблиц или с помощью программы) необходимые параметры биномиального
критерия, т.е. либо частный уровень значимости (когда фиксируется k), либо
значение k (когда фиксируется частный уровень значимости).

-----------------------
[1] A p-value is the probability that a test statistic will be equal to or
greater than the currently observed
????????????????????????????????????????????????????????????????????????????
???????????????????????
?????????????????????????????+?????????????????statistic under assumption
that the null hypothesis, i.e., the hypothesis being tested, is true. The
smaller the p-value, the greater the confidence with which the test rejects
the null hypothesis.
[2] The Bonferroni correction was proposed by Carlo Bonferroni
(Bonferroni, 1935) for the case when several dependent or independent
statistical tests are performed simultaneously (because it does nit follow
from a given significance level holding for each individual comparison that
the same does hold for the set of all comparisons).