A modular metagenomics analysis system for integrated multi-step data exploration. Article uri icon

Overview

abstract

  • MOTIVATION: Computational analysis of large-scale metagenomics sequencing datasets has proved to be both incredibly valuable for extracting isolate-level taxonomic and functional insights from complex microbial communities. However, thanks to an ever-expanding ecosystem of metagenomics-specific algorithms and file formats, designing studies, implementing seamless and scalable end-to-end workflows, and exploring the massive amounts of output data have become studies unto themselves. Furthermore, there is little inter-communication between output data of different analytic purposes, such as short-read classification and metagenome assembled genomes (MAG) reconstruction. One-click pipelines have helped to organize these tools into targeted workflows, but they suffer from general compatibility and maintainability issues. RESULTS: To address the gap in easily extensible yet robustly distributable metagenomics workflows, we have developed a module-based metagenomics analysis system written in Snakemake, a popular workflow management system, along with a standardized module and working directory architecture. Each module can be run independently or conjointly with a series of others to produce the target data format (ex. short-read preprocessing alone, or short-read preprocessing followed by de novo assembly), and outputs aggregated summary statistics reports and semi-guided Jupyter notebook-based visualizations, The module system is a bioinformatics-optimzied scaffold designed to be rapidly iterated upon by the research community at large. AVAILABILITY: The module template as well as the modules described below can be found at https://github.com/MetaSUB-CAMP . CONTACT: lam4003@med.cornell.edu , btt4001@med.cornell.edu , chm2042@med.cornell.edu , or imh2003@med.cornell.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

publication date

  • April 9, 2023

Identity

PubMed Central ID

  • PMC10104186

Digital Object Identifier (DOI)

  • 10.1101/2023.04.09.536171

PubMed ID

  • 37066359