
Selecting the best group using the Indifferent-Zone approach for normal outcomes
Source:vignettes/iz_normal.Rmd
iz_normal.Rmd
Introduction
The indifferent-zone approach for normal outcomes is a statistical method designed to select the group with the highest mean while ensuring that this selection is made correctly at a specified confidence level. This approach assumes that the difference in means between the best group and the next-best group exceeds a specified threshold, called the “indifferent zone”. This zone defines a margin of indifference, within which differences are considered negligible, allowing the decision process to focus only on differences that clearly exceed this margin.
The procedures presented are based on a single stage selection of an outcome with a known standard deviation. If the standard deviation is not known, a multiple stage approach is recommended, or assume that the true standard deviation is not larger than the specified standard deviation. The power will be higher is the true standard deviation is lower.
This package offers several functions to help with this design:
power_best_normal()
calculates for an outcome with known
standard deviation (sd
) the probability of correctly
selecting the best group in a single stage, given the pre-specified
indifferent-zone threshold (dif
), the number of groups
(ngroups
), and the sample size per group
(npergroup
).
ss_best_binomial()
estimates the required sample size
per group to achieve a specified power for correctly selecting in a
single stage the best group, given the known standard deviation
(sd
), the indifferent-zone threshold (dif
),
and the number of groups (ngroups
). This function is based
on the procedure
from Bechhofer et al (1995)
sim_power_best_normal()
estimates the empirical power
(i.e., the proportion of simulated trials in which the best group is
correctly identified) via Monte Carlo simulation. It supports multiple
outcomes and can estimate the empirical power to select the true best
group across all outcomes.
sim_power_best_bin_rank()
is similar to
sim_power_best_binomial()
, but it defines the best group
based on overall ranking across multiple outcomes rather than requiring
top performance on every outcome.
Examples with a single outcome
- What is the probability of correctly selecting in a single stage the best group in a trial with three groups of 30 participants each? Assume the outcome has a standard deviation of 0.5 and the indifferent-zone threshold is 0.25.
power_best_normal(sd = 0.5, dif = 0.10, ngroups = 3, npergroup = 30)
#> [1] 0.660936
- What is the sample size required per group to achieve 80% power for correctly selecting in a single stage the best group among three groups, assuming the outcome has a known standard deviation of 0.5 and the indifferent-zone threshold is 0.10
ss_best_normal(power = 0.8, sd = 0.5, dif = 0.1, ngroups = 3)
#> [1] 68.27888
- Using simulations, what is the probability of correctly selecting the best group in a trial with three groups of 30 participants each? Assume the standard deviation is 0.5 and the indifferent-zone threshold is 0.1
set.seed(1234)
sim_power_best_normal(
noutcomes = 1,
sd = 0.5,
dif = 0.1,
ngroups = 3,
npergroup = 30,
nsim = 1000
)
#> Empirical Power Result
#> -----------------------
#> Power: 0.6640
#> 95% CI: [0.6338, 0.6933]
#> Simulations: 1000
Examples using multiple outcomes
The sim_power_best_normal()
and
sim_power_best_norm_rank()
allow simulating multiple
outcomes. These functions differ in how they define the ‘best’ group.
sim_power_best_normal()
requires that the best group be the
top performer for every outcome, whereas
sim_power_best_norm_rank()
defines the best group based on
overall ranking across outcomes. For example, a group might rank first
for the first two outcomes but second for the third, yet still achieve
the best overall rank among all groups. Both procedures assume the
multiple outcomes are independent between them.
This ranking approach supports weighting of outcomes, allowing you to
assign greater importance to some outcomes over others. For instance, if
performance on the first two outcomes is twice as important as the
third, you could specify weights of c(0.4, 0.4, 0.2)
.
Weights are scaled internally to sum 1.
The functions are flexible and allow you to specify, for each outcome, the event probabilities, indifferent-zone thresholds, and group sample sizes
- What is the probability that the best group is correctly identified as having the highest antibody titres rate across five antigens in a trial with three groups of 30 participants each? The standard deviation is not know but assume it is not greater than 0.5, and the indifferent-zone threshold is 0.1 for all outcomes.
set.seed(12345)
sim_power_best_normal(
noutcomes = 5,
sd = 0.5,
dif = 0.10,
ngroups = 3,
npergroup = 30,
nsim = 1000
)
#> Empirical Power Result
#> -----------------------
#> Power: 0.1150
#> 95% CI: [0.0959, 0.1364]
#> Simulations: 1000
- Same setup, but define the best group based on overall ranking across the five outcomes with equal weights.
set.seed(12345)
sim_power_best_norm_rank(
noutcomes = 5,
sd = 0.5,
dif = 0.10,
weights = 1,
ngroups = 3,
npergroup = 30,
nsim = 1000
)
#> Empirical Power Result
#> -----------------------
#> Power: 0.8700
#> 95% CI: [0.8476, 0.8902]
#> Simulations: 1000