Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance. Academic Article uri icon

Overview

abstract

  • BACKGROUND: With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms. RESULTS: Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype. CONCLUSION: Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants.

publication date

  • October 22, 2014

Research

keywords

  • Algorithms
  • Arabs
  • Diabetes Mellitus
  • Genome, Human
  • Genome-Wide Association Study
  • Heredity
  • High-Throughput Nucleotide Sequencing
  • Obesity
  • Polymorphism, Single Nucleotide
  • Rare Diseases

Identity

PubMed Central ID

  • PMC4216909

Scopus Document Identifier

  • 84932169705

Digital Object Identifier (DOI)

  • 10.1177/088307388900400307

PubMed ID

  • 25339461

Additional Document Info

volume

  • 7