Cancer classification and prediction using logistic regression with Bayesian gene selection. Academic Article uri icon

Overview

abstract

  • In microarray-based cancer classification and prediction, gene selection is an important research problem owing to the large number of genes and the small number of experimental conditions. In this paper, we propose a Bayesian approach to gene selection and classification using the logistic regression model. The basic idea of our approach is in conjunction with a logistic regression model to relate the gene expression with the class labels. We use Gibbs sampling and Markov chain Monte Carlo (MCMC) methods to discover important genes. To implement Gibbs Sampler and MCMC search, we derive a posterior distribution of selected genes given the observed data. After the important genes are identified, the same logistic regression model is then used for cancer classification and prediction. Issues for efficient implementation for the proposed method are discussed. The proposed method is evaluated against several large microarray data sets, including hereditary breast cancer, small round blue-cell tumors, and acute leukemia. The results show that the method can effectively identify important genes consistent with the known biological findings while the accuracy of the classification is also high. Finally, the robustness and sensitivity properties of the proposed method are also investigated.

publication date

  • August 1, 2004

Research

keywords

  • Algorithms
  • Artificial Intelligence
  • Gene Expression Profiling
  • Genetic Testing
  • Neoplasms
  • Oligonucleotide Array Sequence Analysis
  • Pattern Recognition, Automated

Identity

Scopus Document Identifier

  • 4744364173

Digital Object Identifier (DOI)

  • 10.1016/j.jbi.2004.07.009

PubMed ID

  • 15465478

Additional Document Info

volume

  • 37

issue

  • 4