Archive (2016–2006)

Identification of Breast Cancer Prognosis Markers Using Integrative Sparse Boosting

Journal: Methods of Information in Medicine
Subtitle: A journal stressing, for more than 50 years, the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care
ISSN: 0026-1270

Focus Theme: Recent Developments in Boosting Methodology
Guest Editors: M. Schmid, T. Hothorn

Issue: 2012 (Vol. 51): Issue 2 2012
Pages: 152-161

Identification of Breast Cancer Prognosis Markers Using Integrative Sparse Boosting

Focus Theme - Recent Developments in Boosting Methodology

S. Ma (1), J. Huang (2), Y. Xie (3), N. Yi (4)

(1) School of Public Health,Yale University, New Haven, Connecticut, USA; (2) Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa, USA:; (3) Department of Clinical Sciences, UT Southwestern Medical Center, Dallas, Texas, USA; (4) Department of Biostatistics, Section on Statistical Genetics, University of Alabama, Birmingham, Alabama, USA


gene expression, Breast cancer prognosis, integrative analysis, sparse boosting


Objectives: In breast cancer research, it is important to identify genomic markers associated with prognosis. Multiple microarray gene expression profiling studies have been conducted, searching for prognosis markers. Genomic markers identified from the analysis of single datasets often suffer a lack of reproducibility because of small sample sizes. Integrative analysis of data from multiple independent studies has a larger sample size and may provide a cost-effective solution.

Methods: We collect four breast cancer prognosis studies with gene expression measurements. An accelerated failure time (AFT) model with an unknown error distribution is adopted to describe survival. An integrative sparse boosting approach is employed for marker selection. The proposed model and boosting approach can effectively accommodate heterogeneity across multiple studies and identify genes with consistent effects.

Results: Simulation study shows that the proposed approach outperforms alternatives including meta-analysis and intensity approaches by identifying the majority or all of the true positives, while having a low false positive rate. In the analysis of breast cancer data, 44 genes are identified as associated with prognosis. Many of the identified genes have been previously suggested as associated with tumorigenesis and cancer prognosis. The identified genes and corresponding predicted risk scores differ from those using alternative approaches. Monte Carlo-based prediction evaluation suggests that the proposed approach has the best prediction performance.

Conclusions: Integrative analysis may provide an effective way of identifying breast cancer prognosis markers. Markers identified using the integrative sparse boosting analysis have sound biological implications and satisfactory prediction performance.

You may also be interested in...

María González-Díez, Cristina Rodríguez, Lina Badimon, José Martínez-González

Thromb Haemost 2008 100 1: 119-126

Lluis Martorell, José Martínez-González, Cristina Rodríguez, Maurizio Gentile, Olivier Calvayrac, Lina Badimon

Thromb Haemost 2008 99 2: 305-315


M. Kreuz 1, M. Rosolowski 1, H. Berger 1, C. Schwaenen 2, S. Wessendorf 2, M. Loeffler 1, D. Hasenclever1

Methods Inf Med 2007 46 5: 608-618