Detection of heterozygous mutations in the genome of mismatch repair defective diploid yeast using a Bayesian approach.
Academic Article
Overview
abstract
DNA replication errors that escape polymerase proofreading and mismatch repair (MMR) can lead to base substitution and frameshift mutations. Such mutations can disrupt gene function, reduce fitness, and promote diseases such as cancer and are also the raw material of molecular evolution. To analyze with limited bias genomic features associated with DNA polymerase errors, we performed a genome-wide analysis of mutations that accumulate in MMR-deficient diploid lines of Saccharomyces cerevisiae. These lines were derived from a common ancestor and were grown for 160 generations, with bottlenecks reducing the population to one cell every 20 generations. We sequenced to between 8- and 20-fold coverage one wild-type and three mutator lines using Illumina Solexa 36-bp reads. Using an experimentally aware Bayesian genotype caller developed to pool experimental data across sequencing runs for all strains, we detected 28 heterozygous single-nucleotide polymorphisms (SNPs) and 48 single-nt insertion/deletions (indels) from the data set. This method was evaluated on simulated data sets and found to have a very low false-positive rate (∼6 × 10(-5)) and a false-negative rate of 0.08 within the unique mapping regions of the genome that contained at least sevenfold coverage. The heterozygous mutations identified by the Bayesian genotype caller were confirmed by Sanger sequencing. All of the mutations were unique to a given line, except for a single-nt deletion mutation which occurred independently in two lines. All 48 indels, composed of 46 deletions and two insertions, occurred in homopolymer (HP) tracts [i.e., 47 poly(A) or (T) tracts, 1 poly(G) or (C) tract] between 5 and 13 bp long. Our findings are of interest because HP tracts are present at high levels in the yeast genome (>77,400 for 5- to 20-nt HP tracts), and frameshift mutations in these regions are likely to disrupt gene function. In addition, they demonstrate that the mutation pattern seen previously in mismatch repair defective strains using a limited number of reporters holds true for the entire genome.