Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models

Abstract : BACKGROUND: DNA microarray studies identified gene expression signatures predictive of metastatic relapse in early breast cancer. Standard feature selection procedures applied to reduce the set of predictive genes did not take into account the correlation between genes. In this paper, we studied the performances of three high-dimensional regression methods - CoxBoost, LASSO (Least Absolute Shrinkage and Selection Operator), and Elastic net - to identify prognostic signatures in patients with early breast cancer. METHODS: We analyzed three public retrospective datasets, including a total of 384 patients with axillary lymph node-negative breast cancer. The Amsterdam van't Veer's training set of 78 patients was used to determine the optimal gene sets and classifiers using sensitivity thresholds resulting in mis-classification of no more than 10% of the poor-prognosis group. To ensure the comparability between different methods, an automatic selection procedure was used to determine the number of genes included in each model. The van de Vijver's and Desmedt's datasets were used as validation sets to evaluate separately the prognostic performances of our classifiers. The results were compared to the original Amsterdam 70-gene classifier. RESULTS: The automatic selection procedure reduced the number of predictive genes up to a minimum of six genes. In the two validation sets, the three models (Elastic net, LASSO, and CoxBoost) led to the definition of genomic classifiers predicting the 5-year metastatic status with similar performances, with respective 59, 56, and 54% accuracy, 83, 75, and 83% sensitivity, and 53, 52, and 48% specificity in the Desmedt's dataset. In comparison, the Amsterdam 70-gene signature showed 45% accuracy, 97% sensitivity, and 34% specificity. The gene overlap and the classification concordance between the three classifiers were high. All the classifiers added significant prognostic information to that provided by the traditional prognostic factors and showed a very high overlap with respect to gene ontologies (GOs) associated with genes overexpressed in the predicted poor-prognosis vs. good-prognosis classes and centred on cell proliferation. Interestingly, all classifiers reported high sensitivity to predict the 4-year status of metastatic disease. CONCLUSIONS: High-dimensional regression methods are attractive in prognostic studies because finding a small subset of genes may facilitate the transfer to the clinic, and also because they strengthen the robustness of the model by limiting the selection of false-positive predictive genes. With only six genes, the CoxBoost classifier predicted the 4-year status of metastatic disease with 93% sensitivity. Selecting a few genes related to ontologies other than cell proliferation might further improve the overall sensitivity performance.
Document type :
Journal articles
Complete list of metadatas

Cited literature [23 references]  Display  Hide  Download

https://www.hal.inserm.fr/inserm-01996812
Contributor : Claire Lissalde <>
Submitted on : Monday, January 28, 2019 - 3:50:39 PM
Last modification on : Thursday, February 7, 2019 - 4:26:33 PM
Long-term archiving on : Monday, April 29, 2019 - 7:15:51 PM

File

cin-suppl.2-2015-129.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Christophe Zemmour, Francois Bertucci, Pascal Finetti, Bernard Chetrit, Daniel Birnbaum, et al.. Prediction of Early Breast Cancer Metastasis from DNA Microarray Data Using High-Dimensional Cox Regression Models. Cancer Informatics, Libertas Academica, 2015, 14(Suppl 2), pp.129-138. ⟨10.4137/cin.s17284⟩. ⟨inserm-01996812⟩

Share

Metrics

Record views

183

Files downloads

170