rox: A Statistical Model for Regression with Missing Values. Academic Article uri icon

Overview

abstract

  • High-dimensional omics datasets frequently contain missing data points, which typically occur due to concentrations below the limit of detection (LOD) of the profiling platform. The presence of such missing values significantly limits downstream statistical analysis and result interpretation. Two common techniques to deal with this issue include the removal of samples with missing values and imputation approaches that substitute the missing measurements with reasonable estimates. Both approaches, however, suffer from various shortcomings and pitfalls. In this paper, we present "rox", a novel statistical model for the analysis of omics data with missing values without the need for imputation. The model directly incorporates missing values as "low" concentrations into the calculation. We show the superiority of rox over common approaches on simulated data and on six metabolomics datasets. Fully leveraging the information contained in LOD-based missing values, rox provides a powerful tool for the statistical analysis of omics data.

publication date

  • January 13, 2023

Identity

PubMed Central ID

  • PMC9861384

Scopus Document Identifier

  • 85092709179

Digital Object Identifier (DOI)

  • 10.1016/j.csbj.2020.09.014

PubMed ID

  • 36677052

Additional Document Info

volume

  • 13

issue

  • 1