Unfortunately, the history of science has its dark places where people were exposed to treatments against their will, subjected to abuse and otherwise had their rights infringed upon. Today, the most egregious of science's sins have been addressed through such documents as the Declaration of Helsinki and through the establishment of Institutional Review Boards (IRBs) whose purpose is to oversee research involving human subjects and ensure their integrity [1].

The important topics regarding a subject's consent, their rights regarding their participation in a clinical trial and the protection of certain populations, such as children and prisoners, are better covered elsewhere [2]. Instead I want to focus on how we can use statistics to create a more ethical study.

An ethical study design does two things: it minimizes the risks a group of subjects are exposed to as a whole while maximizing our ability to make inferences about the topic at hand. In other words we should only use as many subjects as required to perform the required statistical analyses and no more. If too few are included, the ones that are included will have been exposed to the risks of the trial without a good chance of furthering knowledge about the topic being studies. On the other hand, by including too many, the extra subjects will be exposed to the risks of the trial without meaningfully improving the ability of the researcher to make inferences about the topic being studied.

For this reason and others [3], the planning stage of a clinical trial is therefore the most important part of a trial.

## Sample Size Determination

So, how many subjects do you need? This questions depends on a variety of factors. Let's assume that the trial will be analyzed using Frequentist methods (you could also use Bayesian methods).

Assume that we are interested in testing a null hypothesis, $$H_0$$, against some alternative hypothesis, $$H_1$$. We first need to set our $$\alpha$$, defined as the probability of rejecting $$H_0$$ when $$H_0$$ is true (a false positive), and power, defined as the probability of rejecting the $$H_0$$ when $$H_1$$ is true (a true positive). The values used for $$\alpha$$ and power are determined by what an acceptable value for the particular trial is. For example, if a false positive would be very costly, then $$\alpha$$ is set lower. Similarly, if the researcher wants to be sure that a positive result is truly positive, they can set power higher. Commonly used values for $$\alpha$$ are 0.01 and 0.05 and for power are 0.8 and 0.9.

Suppose that our null hypothesis is of the form $$H_0: \theta = \theta_0$$ and our alternative hypothesis is $$H_1: \theta \ne \theta_0$$ where $$\theta$$ is some parameter of interest and $$\theta_0$$ is the value it has under the null hypothesis. For example, if we are testing a hypothesis about the effect of a drug, $$\theta$$ would be the difference in the effect of the new drug compared to the current standard of care and $$\theta_0$$ would be 0 (i.e. there is no effect).

In order to determine the sample size we need for the trial, we need an estimate $$\theta$$. Oftentimes the value of $$\theta$$ is determined by previous studies or pilot studies. Using previous studies or pilot studies can be problematic due to the additional variance present in the study the estimate is based on. A better approach is to use a value of $$\theta$$ that is clinically meaningful. For example, if we are designing a study to test a new cholesterol drug, we will estimate $$\theta$$ to be a clinically meaningful reduction in cholesterol.

Of course, for most large trials or trials that have a long duration, we cannot expect that all of the subjects will complete the trial. Therefore, we must account for non completion. This can be done simply by inflating the sample size so that the expected sample size after subjects that are lost to follow up are removed is the sample size needed to have an appropriately powered study.

## Early Stopping Rules

During a traditionally planned clinical trial, the trial runs until all subjects have received the treatment for the arm they are in. If the trial shows success (or failure) before it ends, it is a waste of resources and submits subjects to unnecessary exposure to risk to continue the trial [4].

If a researcher decides to stop a clinical trial early, it should be planned for in the protocol. From a statistical point of view, this includes laying out the exact method used to determine the stopping criteria as well as controlling the value of $$\alpha$$ to power the trial to take into account the fact that the test statistic is being estimated more than once.

Methods to create early stopping rules require the trial to have fixed time points at which the results are checked against a stopping rule. A superset of these types of designs, called adaptive designs, can create other types of rules to change the design of the study based on the data collected at a given time. Adaptive designs are a broad class of designs that can use either Frequentist or Bayesian methods.

To take a simple example, consider the following method, called play the winner, for assigning a subject to the arm of clinical trial comparing a two treatments, $$A$$ and $$B$$: the first subject randomly assigned to receive one treatment. If that subject has a positive outcome, then the next subject is randomized to the same treatment. If the treatment of the first subject did not have a positive outcome, then the next subject is randomized to the other treatment. This has the effect of pushing more subjects to the more successful treatment.

In 2010 the United States Food and Drug Administration (FDA) published a draft guidance that contains a detailed overview of the statistical and regulatory implications of the use of this class of clinical trial designs. In particular, the guidance goes into some detail on the drawbacks of adaptive designs, such as the potential to increase false positive conclusions and operation bias. However, thoughtful and careful use of adaptive clinical trial designs can be beneficial both to science and to the subjects.

From an ethical point of view, the use of adaptive clinical trial designs can reduce the exposure of subjects to the risk of treatments that may not benefit them and to delays in receiving therapies that could save their lives.

Perhaps the best example of an adaptive clinical trial design is the I-SPY trials. The I-SPY 2 trial is a trial to study the use of a neoadjuvant therapy for high-risk breast cancer. Breast cancer can best be thought of as a set of diseases [5]. What works for one type of the disease may not work for others. In the I-SPY 2 trial, they are trying to see what type of subjects respond best to the novel treatment. The subjects of the trial were randomized to a therapy based on a biomarker profile taken at enrollment (see Park et al. for details of the randomization part of the design). Within each profile type, the drug was evaluated and if the Bayesian predictive probability of success in a phase 3 trial was greater than 0.85 (i.e. success) or less than 0.1 (i.e. futility), subjects were no longer enrolled in that particular arm.

As arms are removed for futility, the I-SPY 2 trial is stopping exposing high-risk breast cancer patients, who only have a limited amount of time to receive intervention before it is too late for them, to a treatment that does not work on them. This is important; if we know that a treatment does not work, or if the evidence against a treatment is strong enough, it is not ethical to waste the opportunity that a patient has to get an effective treatment. At the same time, the arms that are effective can be advanced providing patients who have the same boilermaker profile to receive an efficacious treatment quicker, thus potentially saving their life.

## Conclusion

It bears repeating that exposing human and animal subjects to novel treatments in scientific trials can be very beneficial to humanity, but that benefit must not come at the expense of the rights of the humans and animals that participate in the trials. The use of appropriate trial design is required to ensure the protection of the rights of trial participants. This can be achieved by properly sizing trials and using designs, such designs that include early stopping rules or adaptive designs, that reduce the number of subjects exposed to risk as the trial's outcome becomes clear.

## References

• Chow, S., Shao, J., and Wang, H. (2003), Sample size calculations in clinical research, New York: Marcel Dekker.
• Center for Drug Evaluation and Research (CDER); Center for Biologics Evaluation and Research (CBER) (2010). "Adaptive Design Clinical Trials for Drugs and Biologics". Food and Drug Administration.
• Park, J. W., Liu, M. C., Yee, D., Yau, C., van ’t Veer, L. J., Symmans, W. F., … on behalf of the I-SPY2 Investigators, D. A. (2016). Adaptive Randomization of Neratinib in Early Breast Cancer. The New England Journal of Medicine, 375(1), 11–22.
 [1] Similar bodies exist to ensure the rights of animal subjects as well.
 [2] For example, in the United States, we are provided trainings that cover this in depth before being allowed to participate in human (and animal) clinical trials.
 [3] Such as how to obtain consent, how to handle adverse events, etc.
 [4] I make this claim several times, but the magnitude of risk should also be weighed in these discussions. For example, there is a lot more risk to a patient to be in a trial for a leukemia drug versus being in a study of a drug for dermatitis. Therefore it may be worth it to end the trial early in the first case, but not make much of difference in the latter.
 [5] Well, that is what works best for me. The reality is complicated and we are continuing to learn more and more about it as we continue to study it.