Feature Selection in Predictive Modeling: A Systematic Study on Drug Response Heterogeneity for Type II Diabetic Patients. Academic Article uri icon

Overview

abstract

  • With the rapid development of computer hardware and software technologies, more and more electronic health data from insurance claims, clinical trials and hospitals are becoming readily available. These data provide a rich resource for developing various healthcare analytics algorithms, among which predictive modeling is of key importance in many real health problems. One important issue for data-driven predictive modeling is high dimensionality, and feature selection is one effective strategy to reduce the number of independent variables and control the confounding factors. However, most of the existing studies just pick one feature selection approach without comprehensive investigations. In this paper, we investigate the issue of drug response heterogeneity for type II diabetes mellitus (T2DM) patients using a large scale clinical trial data. Our goal is to find out the important factors that may lead to the response heterogeneity for three popular T2DM drugs, Metformin, Rosiglitazone and Glimepiride. We implemented 8 different feature selection approaches and compared their performances with various measures including prediction error and the consistency of the identified important factors. Finally, we ensemble all factor lists picked by different algorithms and obtain a final set of factors that contribute to the drug response heterogeneities and verified them through existing literature.

publication date

  • May 6, 2019

Identity

PubMed Central ID

  • PMC6568100

PubMed ID

  • 31258982

Additional Document Info

volume

  • 2019