Assistance Publique-Hôpitaux de Paris, Hôpital Paul Brousse

Faculty of Medicine, University Paris-Sud, Paris, France

, INSERM, UMR-669, Villejuif, France

INSERM, U 1018, Biostatistics Team, 94807 Villejuif, France

, Université Paris-Sud, UMR-S 1018, 94807 Villejuif, France

Abstract

Background

In early-stage of cancer, primary treatment can be considered as effective at eliminating the tumor for a non-negligible proportion of patients whereas for the others it leads to a lower tumor burden and thereby potentially prolonged survival. In this mixed population of patients, it is of great interest to detect complex differences in survival distributions associated with molecular markers that potentially activate latent downstream pathways implicated in tumor progression.

Method

We propose a novel model-based score test designed for identifying molecular markers with complex effects on survival in early-stage cancer. From a biological point of view, the proposed score test allows to detect complex changes in the survival distributions linked to either the tumor burden or its dynamic growth.

Results

Simulation results show that the proposed statistic is powerful at identifying departure from the null hypothesis of no survival difference. The practical use of the proposed statistic is exemplified by analyzing the prognostic impact of Kras mutation in early-stage of lung adenocarcinomas. This analysis leads to the conclusion that Kras mutation has a significant negative prognostic impact on survival. Moreover, it emphasizes that the complex role of Kras mutation on survival would have been overlooked by considering results from the classical logrank test.

Conclusion

With the growing number of biological markers to be tested in early-stage cancer, the proposed score test statistic is a powerful tool for detecting molecular markers associated with complex survival patterns.

Background

Entering the era of so-called personalized oncology through the growing use of molecular markers, one of the main questions concerns their capacities to refine patient prognosis beyond classical bio-clinical risk factors. From clinically and pathologically well-defined group of patients, these markers need to demonstrate their abilities to reveal heterogeneity in survival times among patients. For patients with early-stage of cancer treated with curative therapy, the problem is particularly challenging since molecular markers often reflect complex interplay of dowstream pathways that drive either the remaining tumor burden or its dynamic growth.

Cure rate models, especially those with biological interpretation, are well-suited for analyzing such data. These models are formulated by assuming that the population under study is composed of two subpopulations of patients, those who have no persitant tumor (sometimes referred as long-term survivors or cured patients) and those who have persistent tumor burden and are susceptible of experiencing a disease recurrence. In the literature, the oldest approach relies on two-component mixture models which incorporate a cure fraction in a parametric or semi-parametric framework (for a review, see

In this work and based on an alternative mechanistic cure rate model, we propose a novel score test statistic for detecting molecular markers associated with complex survival patterns in early-stage cancer. After introducing an alternative semi-parametric cure rate model that allows to describe changes in the survival distributions linked to either the tumor burden (cure rate fraction and surviving clonogens distribution) or its dynamic growth (time-to event distribution), a model-based score test is proposed. This novel score test is designed for detecting molecular markers associated with complex survival patterns in early-stage cancer. We illustrate the clinical interest of this statistic by investigating the impact on survival distributions of genetic (Kras mutation), genomic (chromosomal aberration) and histopathologic markers among patients with early-stage lung adenocarcinoma.

Methods

Modeling background

Here, we focus on a binary variable which allocates the patients in two groups _{
i
} subjects in group _{0} + _{1})). For each patient _{
j
}denotes the indicator variable of group 1. For the lung cancer dataset, this variable indicates the presence/absence of Kras mutation. In the following, a tumor is modeled as a set of clonogens, with identical properties and independent evolution. For each patient ^{
th
} latent (unobservable) clonogen, be the time-to-progression until a detectable recurrence with (clonogenic) survival function _{
i
}(_{
ij
} be the number of latent clonogens that survived the treatment for patient _{
ij
} is distributed with probability mass function _{0}
_{1}and _{
ij
} is supposed to be independent of _{
ij
} the censoring time. We assume that _{
ij
}satisfy the condition of independent censoring _{
j
} the indicator variable of group 1. We also denote _{
ij
}(_{(t≤
X
ij
)}the indicator of being at risk for an event at time

For each patient _{
ij
}latent clonogens, the conditional (patient-specific) survival function is expressed as:

Thus, the marginal (population) survival function (for group

Assuming that the number of clonogens in treated tumors is following for the two groups a Poisson distribution _{
i
}(_{
i
}[1−_{
i
}(_{
i
} (i.e. the Poisson parameter) is the mean number of clonogens and _{
i
}) is the probability of having no surviving clonogen (cure fraction). From this framework, one can modelize short and long-term effects of a marker _{
i
}(_{
i
}) quantifies the difference in the long-term survivors rates. It is straighforward to see that a same cure fraction between the different groups (no long-term effect) implies a same distribution for the number of surviving clonogens.

In the following, we consider a family of discrete distributions proposed by Katz

Distribution of the number of clonogens

We recall that Katz

where

Katz showed that the probability generating function is such as:

with |

It follows that the initial probability is equal to: _{0}=^{−ω
} for _{0}.

Moreover, it is worth noting that ^{2}/^{2}and ^{2}/^{−1}. This family covers various distributions with the property of being under-dispersed (

Relying on this family of distributions, we propose to consider the following semi-parametric cure model.

Improper survival function

According to the above results, a semi-parametric improper cure model, which encompasses the Poisson cure model, is obtained as follows:

The marginal survival function is defined such as:

where _{
i
}(_{
i
}(

Thus, we have the following general survival functions in group

The corresponding cumulative hazard function and hazard function are noted _{
i
}(_{
i
}(_{0}(_{1}(_{0}(_{1}(_{1}(_{0}(_{0}(_{0}(_{0}(

It is useful for the following to write the ratio of the hazard functions _{0}(_{1}(

In the following, we denote _{1}/_{0}]. From a biological perspective, belonging to group 1 is associated with changes in the cure fraction, the conditional distribution of the number of surviving clonogens or the latent survival (tumor progression) through the parameters of interest _{1}to lie on [0,1] leads to the transformation cure model

In this work, the general null hypothesis to be tested _{0}:

The proposed statistic

In the following, we derive a score statistic which is optimal under a classical log-linear relationship such as

Thus, the log-partial likelihood derived under this multiplicative model is such as:

where

The score vector is derived from the first derivative of the log-partial likelihood with respect to _{0}:

The score vector is deduced under the null hypothesis (_{0}:

For computing the score statistic, we should substitute _{0}(_{0} by efficient estimators _{0}. Here, _{
j
}(_{{
X
j
≤t,δ
j
=1}} is the left-continuous version of the Nelson-Aalen estimator for the cumulative hazard _{max}. In our problem, the limiting distribution of the proposed statistic where _{0} is replaced by _{0}if the upper bound of the domain for the survival distribution is less or equal to the upper bound of the domain for the censoring distribution

The corresponding information matrix

and

with

The elements of the score vector and of the information matrix (_{0}(_{
j
}) and _{0}as given above.

Finally, the statistic

is asymptotically distributed under _{0}as a chi-square with three degrees of freedom.

Results

Simulation study

We conducted a simulation study to evaluate the finite-sample performance of the proposed statistic. We reported the size of the test as well as the power properties of the proposed test (noted

We considered a single binary variable taking a value of 0 (e.g. absence of a marker) or 1 (e.g. presence of a marker) with half of the individuals having value 1. We assumed that the survival distribution (for group 0) is such as: _{1}(_{0}(_{ + })≠_{1}(_{ + })) and with/without the same latent survival function (_{0}(_{0}(^{−t
}or _{0}(_{0}(

Various values for the parameters were considered. For overdispersed cases, we took _{1} are chosen so that the cure fractions are equal or different with ^{
γ
}being equal to: 1 and 1.2. For the latent survival distribution shift, we considered values ^{
α
}=1,1.25,1.5. The censoring time _{
j
} was generated from an exponential distribution with parameter

To illustrate these scenarios, we plotted (Figure _{0}(_{ + })=0.5) . The marginal survival curve for group 0 (reference curve) is in black. The survival curves for over-dispersed cases (^{
α
}=1.5) and different cure fractions (cure fraction shift: ^{
γ
}=1.2) and latent survival functions are in red. The survival curves for under-dispersed cases (^{
α
}=1.5) and different cure fractions (cure fraction shift: ^{
γ
}=1.2) and latent survival functions are in blue.

Theoretical survival curves for seven situations

**Theoretical survival curves for seven situations.** The reference curve is in black. Survival curves for over-dispersed cases (resp. under-dispersed) are in red (resp. in blue).

The estimated levels of the proposed test and the logrank test and under the null hypothesis of no survival difference between the two groups are within the binomial range [0.031;0.069] for either censored cases or uncensored cases whatever the level of the cure fraction. Tables

**Left panel (1a) uncensored cases**

**Right panel (1b) censored cases**

_{0}=30

^{
γ
}=1

^{
γ
}=1.2

_{0}=30

^{
γ
}=1

^{
γ
}=1.2

^{
α
}=1

0.12

0.57

^{
α
}=1

0.16

0.62

0.58

0.80

0.47

0.79

^{
α
}=1.25

0.22

0.69

^{
α
}=1.25

0.29

0.77

0.87

0.97

0.79

0.95

^{
α
}=1.50

0.27

0.76

^{
α
}=1.50

0.42

0.83

0.96

0.98

0.90

0.97

**Left panel (2a) uncensored cases**

**Right panel (2b) censored cases**

_{0}=50

^{
γ
}=1

^{
γ
}=1.2

_{0}=50

^{
γ
}=1

^{
γ
}=1.2

^{
α
}=1

0.07

0.27

^{
α
}=1

0.15

0.38

0.38

0.57

0.28

0.48

^{
α
}=1.25

0.09

0.35

^{
α
}=1.25

0.21

0.55

0.69

0.83

0.48

0.69

^{
α
}=1.50

0.08

0.41

^{
α
}=1.50

0.29

0.66

0.84

0.94

0.63

0.83

**Left panel (3a) uncensored cases**

**Right panel (3b) censored cases**

_{0}=70

^{
γ
}=1

^{
γ
}=1.2

_{0}=70

^{
γ
}=1

^{
γ
}=1.2

^{
α
}=1

0.07

0.15

^{
α
}=1

0.12

0.20

0.29

0.33

0.14

0.27

^{
α
}=1.25

0.07

0.19

^{
α
}=1.25

0.14

0.31

0.40

0.54

0.16

0.39

^{
α
}=1.50

0.06

0.21

^{
α
}=1.50

0.21

0.42

0.64

0.70

0.22

0.48

**Left panel (4a) uncensored cases**

**Right panel (4b) censored cases**

_{0}=30

^{
γ
}=1

^{
γ
}=1.2

_{0}=30

^{
γ
}=1

^{
γ
}=1.2

^{
α
}=1

0.08

0.06

^{
α
}=1

0.15

0.05

0.34

0.45

0.27

0.31

^{
α
}=1.25

0.17

0.07

^{
α
}=1.25

0.31

0.14

0.73

0.81

0.53

0.58

^{
α
}=1.50

0.29

0.09

^{
α
}=1.50

0.48

0.23

0.94

0.95

0.76

0.75

**Left panel (5a) uncensored cases**

**Right panel (5b) censored cases**

_{0}=50

^{
γ
}=1

^{
γ
}=1.2

_{0}=50

^{
γ
}=1

^{
γ
}=1.2

^{
α
}=1

0.05

0.07

^{
α
}=1

0.07

0.07

0.13

0.17

0.08

0.10

^{
α
}=1.25

0.06

0.08

^{
α
}=1.25

0.10

0.05

0.34

0.39

0.18

0.15

^{
α
}=1.50

0.09

0.05

^{
α
}=1.50

0.11

0.10

0.60

0.68

0.31

0.28

**Left panel (6a) uncensored cases**

**Right panel (6b) censored cases**

_{0}=70

^{
γ
}=1

^{
γ
}=1.2

_{0}=70

^{
γ
}=1

^{
γ
}=1.2

^{
α
}=1

0.06

0.08

^{
α
}=1

0.05

0.08

0.05

0.09

0.07

0.07

^{
α
}=1.25

0.05

0.06

^{
α
}=1.25

0.06

0.05

0.10

0.15

0.08

0.07

^{
α
}=1.50

0.05

0.06

^{
α
}=1.50

0.09

0.05

0.21

0.31

0.10

0.06

For uncensored cases, the power gains of the proposed test are striking for either differences in cure fraction or latent survival distribution. Gains of power of the proposed test are in decreasing order of the cure fraction. In any case, the power of the proposed test is higher of those of the logrank test. For the censored case, theses latter trends are also noticed. The main difference relative to the uncensored case is in the magnitude of the power values which are more markedly decreased. In any case, the same patterns are observed for the overdispersed and underdispersed cases.

Lung adenocarcinoma example

In early-stage lung cancer (stage I), surgical resection can be considered as effective at eliminating the tumor burden for a non-negligeable proportion of patients whereas, for the others, it leads to a lower tumor burden and thereby prolonged survival. The majority of tumor recurrences are detected within two years after the surgical resection and the five-year survival following the diagnosis is frequently considered as a cure, the main threats being other smoking-related diseases such as cardiopulmonary disorders.

The dataset considered in this study is based on a homogeneous series of 134 patients with stage IB lung adenocarcinomas who underwent surgical resection. All specimens underwent pathological review. Here, we investigated the prognostic impact of three different types of markers : genetic (Kras exon 2 mutation), genomic (recurrent copy-number losses on genomc areas 19p13.3 and 19p13.11) and histopathologic (combined marker: necrosis and differentiation).

We recalled that Kras gene belongs to a gene family of small G proteins, anchored on the cytoplasmic side of cell membrane, that play a central role in cell signalling related to cell proliferation, cell survival and cell motility (for a review see

All patients were genotyped for Kras mutations. Primers (Kras exon 2) were used to amplify the relevant regions and DNA sequencing was performed on an ABI3730xl Sanger sequencer. All mutations were confirmed by bidirectional sequencing. In this study, the percentage of Kras mutation was 18% (24 cases), 37.6% and 34% displayed copy loss on 19p13.3 and 19p13.11, respectively, and 23% of the tumor samples showed necrosis associated with a poor differentiation. The time-to-event (death) was calculated from the date of treatment to the time of death or last follow-up. Overall survival rates were derived from Kaplan-Meier estimates and given with their 95% confidence intervals. The median of follow-up was of four years and we observed thirty sevent events. For the entire population, overall survival at two years and five years was of 87.2% [81.5-93.3] and 65.4% [56.3-75.9].

When testing for differences in overall survival for Kras mutation, the logrank test (

Kaplan-Meier curves of the overall survival based on Kras mutation status

**Kaplan-Meier curves of the overall survival based on Kras mutation status.**

When testing for differences in overall survival for copy-number loss on genomic areas 19p13.3 and 19p13.11, the logrank test was not significant for the two areas (_{19p13.3}=0.5,_{19p13.11}=1,

Kaplan-Meier curves of the overall survival based on copy-number loss of 19p13.11 status

**Kaplan-Meier curves of the overall survival based on copy-number loss of 19p13.11 status.**

When testing for differences in overall survival for the combined histopathological marker, the logrank test (

Kaplan-Meier curves of the overall survival based on the combined histopathological marker

**Kaplan-Meier curves of the overall survival based on the combined histopathological marker.**

All the figures show a clear time-varying effect between the two curves as time goes on. From a biological perspective, the marginal survival distribution observed for the Kras positive (activating) mutation, deletion of genomic area 19p13.11 and necrosis/poor differentiation status can be interpreted as reflecting molecular changes affecting either the tumor burden or the dynamic growth.

Discussion

With significant progress in defining homogeneous histological and clinical group of early-stage cancer patients who sustained a same potential curative therapy, the challenge is now to find novel molecular markers having capability to separate patients according to their time-to-event outcome. This problem can be handled by considering cure rate models that are specified using either a two-component mixture model or bounded cumulative hazard approach.

In this work, a score test is proposed for testing the null hypothesis of no survival difference in early-stage of cancer. From a biological point of view, this score test allows to detect changes in the cure fraction, the distribution of surviving clonogens and the tumor progression. It is derived from a flexible model that describes the impact of discrete markers on the survival time distribution with or without a same cure fraction and stems from biological as well as pragmatic statistical considerations. A nice feature of the proposed score-type statistic is that it can be easily implemented since it does not require to estimate the parameters of the cure model under the alternative hypothesis. It should be noted that the proposed procedure can be extended for comparing more than two groups with Poisson cure rate model as the benchmark model for the reference group. The new alternative hypothesis will be such as there is at least one of the groups that differs from the reference one at some time for either the distribution of the number of clonogenes or the latent (clonogenic) survival functions.

Simulation results show that striking gains in power can be achieved by our proposed test as compared to the classical Log-rank test. As the cure rate fraction increases, the power of the test decreases, but remains higher than that of the logrank test. This latter result is not surprising, since increasing the cure fraction reduces the number of potential events. In the presence of censoring, the power of the proposed test decreases, but remains higher than that of the logrank test. It is worth recalling that the validity of the proposed score test requires asymptotic efficiency of cumulative hazard rate estimators which implies that the susceptible patients should experience the event within the maximum length of follow-up.

In our homogeneous series of early-stage lung adenocarcinoma presented in this article, the proposed statistic is particularly appealing since the majority of the patients are amenable to cure. If some lung cancer studies have reported a deleterious prognostic effect of Kras mutation, there is still some debate. In this study, we show a significant relationship between overall survival and Kras mutation status that would have been overlooked by only considering results from the classical logrank test. From a biological point of view, one could hypothesize that downstream effectors of Kras mutation have complex biological activities affecting either the tumor burden or the dynamic growth. Moreover, these results also argue in favor of considering combined histopathological marker in prognostic studies and give some interesting insights regarding recurrent driver copy-number loss on genomic area 19p13.11 that may require future exploration. In further works, it could be of interest to estimate the parameters that are associated to survival differences. For such purpose, the estimation procedure introduced by Tsodikov

Conclusion

In summary, detecting molecular markers associated with complex survival patterns in early-stage cancer is of potential interest for research in enlighting their contribution to the natural history of tumor disease. We believe that our proposed score test statistic is a powerful tool for detecting molecular markers associated with complex survival patterns. Moreover, it should be noted that this test statistic can be applied in any other medical fields for which there is the possibility that some patients will not experience the event of interest.

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

PB and TM developed the mathematical model and wrote the paper. Both authors read and approved the final manuscript.

Acknowledgements

The authors thank Dr. Sophie Camilleri, Dr. Marco Alifano and Dr. Patrick Tan for their work on the Lung cancer data.