, Univ Paris Diderot, Sorbonne Paris Cité, Unité de Biostatistique et Epidémiologie Clinique, UMR-S717,, Paris, F-75010, France

Département de Biostatistique et Informatique Médicale, Hôpital Saint-Louis, AP-HP, Paris, F-75010, France

, INSERM, U717, Paris, F-75010, France

Abstract

Background

Simon’s two-stage designs are widely used for cancer phase II trials. These methods rely on statistical testing and thus allow controlling the type I and II error rates, while accounting for the interim analysis. Estimation after such trials is however not straightforward, and several different approaches have been proposed.

Methods

Different approaches for point and confidence intervals estimation, as well as computation of

Results

For point estimation, the uniformly minimum variance unbiased estimator (UMVUE) and the bias corrected estimator had better performance than the others when the actual sample size was as planned. For confidence intervals, using a mid-

Conclusions

The use of the UMVUE may be recommended as it exhibited good properties both when the actual number of patients recruited was equal to or differed from the preplanned value. Restricting the analysis in cases where the trial did not stop early for futility may be valuable, and the UMVCUE may be recommended in that case.

Background

Phase II trials primarily aim at evaluating the activity of a new therapeutic regimen to decide if it warrants further evaluation in a larger-scale phase III trial, where it is usually compared to a standard treatment. The screening purpose of phase II trials implies that they are designed to reject a new therapeutic regimen showing low therapeutic activity. In cancer phase II trials, therapeutic activity is typically defined in terms of tumor shrinkage

Cancer phase II trials are often designed as multistage trials (two stages being most common) allowing early trial termination in case of a low response rate, in order to avoid giving patients an ineffective treatment and wasting resources. The original idea of such a strategy with early termination was suggested by Gehan

As phase II trials primarily lead to the decision to proceed to a next step in the evaluation of the therapeutic regimen or not, their design essentially relies on statistical testing. Cancer phase II trials are therefore designed to control the probabilities to continue with an ineffective regimen or to abandon an effective one (type I and II error rates, respectively). Further analysis, and in particular estimation, is nevertheless useful and usually conducted, especially if the new regimen is selected for further consideration

One important point concerning inference in two-stage phase II trials has been somewhat overlooked in the literature. As estimation is most important when the therapeutic regimen has been considered as effective, inference may be more common when the phase II trial proceeded to the second stage as compared to cases where it was stopped for futility at the first stage. Inference may thus be conditional on proceeding to the second stage (as e.g. in

Another issue is the actual total sample size of the trial. Cancer phase II trials are generally of limited sample size, and methods are derived from the ’exact’ binomial distribution of data. However, the actual number of patients recruited in the trial may be different from the planned sample size

In this paper, we compare the performance of the different approaches proposed in the literature for inference in a two-stage Simon’s phase II trial. In the next section, we present the different point estimators, confidence intervals and

Methods

Simon’s design and notations

Let us denote _{0}versus _{1} = _{0} + _{0} is the highest probability of response which would indicate that the agent is of no further interest, and _{1}the smallest probability of response indicating that the agent may be promising. Simon _{1}subjects are accrued during the first stage. If the number of responses observed in the first stage _{1} is lower or equal to a critical value _{1}, the trial is stopped for futility. If _{1}>_{1}, the trial proceeds to a second stage where _{2} additional patients are accrued. Let us denote _{2} the number of responses observed in the _{2}second stage patients, _{t} = _{1} + _{2}and _{t} the final critical value. Then if _{t}≤_{t}futility is concluded at the end of the trial, whereas efficacy is concluded if _{t}>_{t}. Given (_{0}_{1}) many such two-stage designs may satisfy the prespecified type I and II error rates (_{t} = _{1} + _{2}and is referred to as the ’minimax’ design. Jung

We suppose here that the sample size of the trial corresponds to the planned _{1} and _{2}, and that the stopping rules have been respected at the end of the first stage. Then, as _{1} and _{2}are both sums of independent Bernoulli trials, they follow a Binomial distribution of parameters (_{1}, _{2}, _{1} if _{t} if _{1} if _{t} if

for _{1} if _{1} + 1,…,_{t} if

Inference following a two-stage design

Point estimate

Although the primary goal of phase II trials is decision making rather than inference, obtaining an estimate of the true response rate is often of interest, particularly when the trial was deemed successful and the new drug accepted for further evaluation in phase III trials

The maximum likelihood estimator (MLE) is simply the sample proportion

Due to the sequential nature of the trial, the MLE is biased. Actually, in Simon’s design, when extreme small values of _{1} are observed at the first stage, the trial is terminated without a chance to correct the downward bias, leading to a negatively biased MLE. More precisely, the bias is given by

Building on prior work of Whitehead

Guo and Liu

Noting that _{1}/_{1} is unbiased for _{1}/_{1} given (

A median unbiased estimator may be considered as the value of _{2} is different from its prespecified value, and will thus be denoted by _{2} was as planned.

Another approach was used by Tsai _{1} (which must be at least _{1} + 1). This conditional estimator will be denoted by _{1}≤_{1} may also be derived, but it makes little sense in cases where _{1} is small, in particular when _{1} is 0 or 1, which is the case for optimal and minimax designs for _{0} = 0.05 and _{1} = 0.2 or _{1} = 0.25 with

Relating to the work of Tsai

For inference conditional on proceeding to the second stage, the uniformly minimum variance conditionally unbiased estimator (UMVCUE) can also be obtained, as proposed by Pepe _{2}/_{2}is unaffected by the early stopping option and thus conditionally unbiased for _{2}/_{2} given (_{1}/_{1}, which is equal to the UMVUE in this case. For Simon’s design, the UMVCUE can thus be obtained by

Numerical studies in various settings showed that the biased-corrected estimators

Once (_{1}and _{t} are sufficient to conclude at the rejection of the null hypothesis or not. It remains however common practice to compute a _{0}). This yields the naive _{n}

The assumption on the distribution of _{1}<_{1} and _{2} = _{1}.

It is therefore necessary to use the proper distribution of observed data to compute a _{1}=24, _{2}=39, _{1}=8 and _{t}=24 (optimal design for _{0}=0.30, _{1}=0.50,

The

The bias-corrected estimators have the same ordering as the MLE

Jung

It can be rewritten as

which is equivalent to the _{2} is as planned

When estimation is performed conditional on proceeding to the second stage, a conditional _{Π}(

where _{Π}(_{c} is computed by

If the trial is stopped at the first stage, _{c} can simply be computed by _{s}.

Confidence interval

Beside point estimates, confidence intervals are often reported in phase II trials. Despite the one-sided nature of Simon’s design, it is not uncommon to report two-sided (1−2

The first basic idea is to use Clopper–Pearson _{1}_{2}), where _{1} and _{2} are the numerical solutions of

and

The existence of this interval relies on the stochastic ordering of the distribution of (

In the simple setting of a single binomial proportion, the Clopper–Pearson confidence interval is known to be conservative

and

Tsai _{2}=0 or _{t}. They were thus not considered here.

Extended or shortened trial

It is not uncommon that the actual sample size of a phase II trial would be different from the planned sample size _{1}subjects, and the difference in sample size only concerns the second stage sample size. They also proposed a method for inference at the end of the trial, thus providing a point estimate, a confidence interval and a

Assume _{2} + _{2} patients are accrued at the second stage instead of the preplanned _{2}, and that _{1} = _{1} in the original design with _{2} patients at the second stage. The new conditional type I error rate is thus lower or equal to the original conditional type I error rate, allowing to control the unconditional type I error rate.

They also proposed to compute the unconditional

where ^{Π∗}is the solution of ^{Π∗} allows to extend the conditional power to all potential values of _{1}, whereas only one particular value (_{1}) was observed. The use of the conditional power function _{1} and the actual sample size for stage 2 ^{Π∗}, smaller ^{Π∗} indicating stronger evidence against the null hypothesis. This ordering is coherent with the hypothesis testing strategy they proposed, based on a new critical value to control the conditional type I error. In that respect, the _{pk} is lower than

Koyama and Chen proposed the estimator _{0} yielding a _{k}=0.5, and a two-sided Clopper–Pearson-like confidence interval based on _{k}. The definition of _{k} by equation 11 should allow to control the overall type I error rate, but the properties of the test, estimator and confidence interval have not been thoroughly studied.

Although Koyama and Chen used a biased-corrected estimator when the second stage sample size was as planned, we denoted _{2} patients are accrued at the second stage.

Numerical study

To examine the properties of the different methods, numerical studies were conducted. Several design scenarios were considered, that covered a range of possible phase II trials in oncology. To help determining these scenarios, a limited literature search of phase II cancer trials using Simon’s design over the last years was performed. As this study was informal and arbitrarily limited to some journals, no results are reported. Twelve design scenarios where thus considered, with response rates under the null hypothesis of 0.05, 0.1, 0.2, 0.3, 0.4 and 0.5. Trials with higher values of _{0}were considered as pretty rare, and therefore not considered. For each value of _{0}, two differences in response rate between the null and alternative hypotheses were considered, namely 0.15 and 0.2. In all cases, the type I error rate _{0} and H_{1}.

For each design scenario considered, the probability of all possible outcomes (_{0} to _{0} + 0.20 (thus _{1} when _{1}when

To investigate the impact of accrual of some more or some fewer patients at the second stage as compared to the planned _{2} value, trials where the second stage sample size was decreased by 1 or 2 or increased by 1, 2 or 5 were considered. These settings were not symmetrical because it was felt that overaccrual would be more frequent, because of the time delay to close a trial and because investigators would more likely want to protect the trial from patients exclusion and thus easily accrue more patients. Main analysis was unconditional: i.e. performance of the different methods was averaged over all possible outcomes. As some methods were more specifically developed to correct the analysis of the second stage results only, analysis restricted to cases where the trial proceeded to a second stage was also performed, and referred as conditional analysis.

To keep results simple and because the main findings were close to one scenario or another, only the results of six of the twelve scenarios are presented in detail. Additionally, these detailed results are only presented for situations where the second stage sample size was as planned. For situations where the second stage sample size was different from planned, the tables present results averaged over the different scenarios and the different values of _{2} (simple arithmetic average without any weighting). However, the description of results encompassed the whole range of data obtained and not only the results presented in the tables. Particular cases where results were representative or different from the overall message were then isolated.

All computations were performed using R 2.13.2 statistical software

Results

Trial accrual as planned

Results displayed in Figure _{0} than to _{1}, while the RMSE of both estimators become similar when _{1}. As already noted in the illustrative examples of Guo and Liu _{0}. The median estimator also perfoms well in terms of RMSE, and even exhibits the smallest one for values of _{0}. The conditional estimators have similar properties to each other, with much higher negative bias than the MLE, especially for values of _{0}. They had also higher or equal RMSE than the MLE.

Performance of the estimators: bias and root mean squared error (RMSE)

**Performance of the estimators: bias and root mean squared error (RMSE).**

In terms of statistical testing, the test sizes represented on Figure _{0} show that the naive binomial test and the test based on the conditional distribution are not adequate, these tests being too conservative in several settings. The test based on stage-wise ordering leads to the correct decision, with the same probability of rejection as given by design. In our numerical settings, the test based on MLE ordering had similar characteristics as the test based on stage-wise ordering. Actually, both only differ for a limited range of possible (

Performance of the tests based on

**Performance of the tests based on**** p **

Coverage probabilities of the 90% confidence intervals are presented in the right sub-panel of Figure _{0} such as 0.05 for instance. The mid-_{0} for smaller values of _{0}, but the coverage probability fluctuated around 90% when _{0} was 0.20 or more, within a margin of −1_{0}, their coverage probabilities were lower than the nominal level in this unconditional setting. This occurred less frequently and less dramatically for the conditional exact confidence interval, which however had a coverage probability clearly above its nominal level for _{0}, especially for small values of _{0}.

Extended or shortened trial

Results obtained when the second stage sample size was modified are presented in Tables _{0}, but their bias under _{1} was similar to the one of Koyama–Chen estimator, with even lower RMSE for the UMVCUE.

Property

Method

_{0}

_{0}

Bias

−0.015

−0.005

−0.004

0.001

0.000

0.000

−0.029

−0.012

−0.028

−0.009

−0.009

−0.012

RMSE

0.060

0.071

0.063

0.067

0.071

0.067

0.061

0.076

0.062

0.064

0.062

0.070

Rejection probability

_{n}

0.033

0.882

_{pm}

0.036

0.887

_{pu}

0.036

0.887

_{c}

0.012

0.800

_{pk}

0.035

0.885

Coverage probability

Naive exact

0.940

0.916

Stage-wise

0.937

0.933

Mid-

0.916

0.895

Conditional exact

0.952

0.906

Conditional score

0.935

0.851

Conditional mid-

0.936

0.860

Koyama–Chen

0.937

0.931

_{2}

_{2}

_{2}

_{2}

_{2}

**Settings**

**Estimator**

**Bias**

**RMSE**

**Bias**

**RMSE**

**Bias**

**RMSE**

**Bias**

**RMSE**

**Bias**

**RMSE**

Optimal design with _{0} = 0.05, _{1} = 0.2: _{1}=21, _{2}=20, _{1}=1, _{t}=4

_{0}

-0.008

0.038

-0.009

0.037

-0.009

0.037

-0.009

0.037

-0.010

0.036

-0.002

0.041

-0.003

0.041

-0.003

0.040

-0.003

0.040

-0.003

0.040

0.000

0.046

0.000

0.046

0.000

0.046

0.000

0.045

0.000

0.045

-0.018

0.036

-0.018

0.036

-0.018

0.036

-0.018

0.036

-0.018

0.035

-0.018

0.037

-0.018

0.037

-0.018

0.036

-0.018

0.036

-0.018

0.035

-0.006

0.039

-0.006

0.039

-0.006

0.038

-0.006

0.038

-0.006

0.038

_{1}

-0.004

0.071

-0.004

0.071

-0.005

0.069

-0.005

0.069

-0.005

0.067

0.001

0.068

0.001

0.068

0.001

0.066

0.001

0.066

0.001

0.064

0.000

0.068

0.000

0.067

0.000

0.066

0.000

0.065

0.000

0.064

-0.012

0.077

-0.012

0.076

-0.011

0.074

-0.011

0.073

-0.011

0.071

-0.009

0.076

-0.009

0.075

-0.009

0.074

-0.009

0.073

-0.009

0.071

-0.012

0.071

-0.013

0.070

-0.013

0.069

-0.013

0.068

-0.013

0.067

Minimax design with _{0}=0.4, _{1}=0.6: _{1}=29, _{2}=25, _{1}=12, _{t}=27

_{0}

-0.015

0.078

-0.016

0.078

-0.016

0.077

-0.017

0.077

-0.018

0.076

-0.004

0.080

-0.004

0.080

-0.004

0.080

-0.004

0.079

-0.004

0.079

0.000

0.087

0.000

0.087

0.000

0.087

0.000

0.087

0.000

0.087

-0.037

0.082

-0.037

0.082

-0.036

0.081

-0.036

0.080

-0.036

0.079

-0.035

0.083

-0.035

0.082

-0.035

0.081

-0.035

0.081

-0.035

0.080

-0.010

0.079

-0.010

0.078

-0.010

0.078

-0.010

0.078

-0.010

0.078

_{1}

-0.003

0.074

-0.003

0.074

-0.003

0.073

-0.003

0.073

-0.003

0.071

0.001

0.070

0.001

0.070

0.002

0.069

0.002

0.068

0.002

0.067

0.000

0.071

0.000

0.070

0.000

0.069

0.000

0.069

0.000

0.068

-0.011

0.082

-0.011

0.081

-0.010

0.080

-0.010

0.079

-0.010

0.077

-0.007

0.080

-0.007

0.079

-0.007

0.078

-0.007

0.077

-0.007

0.076

-0.012

0.073

-0.012

0.072

-0.011

0.071

-0.011

0.071

-0.011

0.070

In terms of hypothesis testing and

The mid-_{1} than under H_{0} and for higher values of the probability of response

Analysis conditional on proceeding to stage 2

When analysis was restricted to the trials proceeding to the second stage, the performance of the estimators was different from previously (Figure _{1} or above, with a bias of the same magnitude than the bias of the conditional estimator _{1}.

Performance of the estimators for conditional inference: bias and root mean squared error (RMSE)

**Performance of the estimators for conditional inference: bias and root mean squared error (RMSE).**

In terms of RMSE, the conditional estimators _{0} and of _{1}. Despite their bias, all unconditional estimators except the UMVUE had generally lower RMSE than the conditional estimators. With biases as high as 4% for response rate of 5% or as 8% for a response rate of 20%, these estimators cannot be recommended for conditional inference, however.

Conditional inference was also the only one preserving the conditional type I error, but the test could be rather conservative in some situations (Figure

Performance of the tests based on

**Performance of the tests based on**
** p **

When the sample size at the second stage _{2} was different from its planned value, the conditional estimators achieved similar bias reduction as when _{2} was as planned (Table _{c}also allowed to control the conditional type I error. The coverage probabilities of conditional score and conditional mid-_{0} than under H_{1}, and closer to their nominal value under _{1}, whereas the reverse was observed for other methods. As compared to the conditional estimator, Koyama–Chen estimator had similar bias and lower RMSE under H_{1}, but much higher bias under H_{0}. It should however be noted that this estimator is constructed as a median and not a mean estimator, so that some degree of bias can be expected when estimating the response rate. In terms of hypothesis testing, this method however failed to adequately control the conditional type I error rate and confidence intervals had too high coverage probability in most cases.

**Property**

**Method**

_{0}

_{0}

Bias

0.038

0.004

0.053

0.010

0.084

0.010

−0.003

−0.002

0.000

0.000

0.057

−0.003

RMSE

0.057

0.059

0.068

0.056

0.086

0.054

0.060

0.065

0.061

0.064

0.062

0.057

Rejection probability

_{n}

0.100

0.931

_{pm}

0.110

0.936

_{pu}

0.110

0.936

_{c}

0.035

0.844

_{pk}

0.107

0.933

Coverage probability

Naive exact

0.899

0.939

Stage-wise

0.890

0.957

Mid-

0.852

0.941

Conditional exact

0.939

0.929

Conditional score

0.910

0.894

Conditional mid-

0.913

0.903

Koyama–Chen

0.889

0.956

Discussion

In terms of estimation, _{0}, i.e. in cases when estimation is the most important. Although our simulations did not encompass all possible ranges of response rates and treatment effects, they cover a wide range of plausible situations, in which no clear advantage of the bias corrected estimator

The choice of a conditional or unconditional inference is clearly overlooked in practical applications. Conditional inference — and conditional bias in particular — has attracted some interest in the setting of group sequential phase III trials, with concerns rather directed at the conditional bias of the estimator of the treatment effect when trials were stopped early for efficacy _{1}≤_{1}. In such a case, the estimator would be conditionally unbiased whether the trial was stopped at the first or the second stage, and thus would be unconditionally unbiased. Using a distribution of outcomes conditional on early stopping makes however little sense — if any — when _{1} is small. For instance, if _{1}=0, then the only potential outcome in case of early stopping is _{1}=0, thus leading to a single possible value for the estimator of

In this study, we have concentrated on Simon’s design for phase II cancer trials. Other designs or adaptations however exist. In particular, Jovic and Whitehead have recently proposed point estimates, confidence intervals and

In practical applications, it may occurr that the actual number of patients recruited would be slightly different from the preplanned value. For instance some patients may be unevaluable for response or they may withdraw their consent during study. On the contrary, some patients may be included in the study before recruitment is formally closed. For these cases, where the decrease or increase of second stage sample size may be considered as non informative, Koyama and Chen proposed inference procedures based on conditional power

Another interesting field of further research concerns inference in adaptive phase II trials, where the second stage sample size can be adapted according to the first stage results

Conclusions

For point estimation, the UMVUE

When one is more particularly interested on inference conditional on having proceeded to the second stage, the UMVCUE

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

RP and KD designed the study, performed all statistical analyses and participated to article writing. Both authors read and approved the final manuscript.

Pre-publication history

The pre-publication history for this paper can be accessed here: