Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment. Academic Article uri icon

Overview

abstract

  • MOTIVATION: Multiple sequence alignment is a cornerstone of comparative genomics. Much work has been done to improve methods for this task, particularly for the alignment of small sequences, and especially for amino acid sequences. However, less work has been done in making promising methods that work on the small-scale practically for the alignment of much larger genomic sequences. RESULTS: We take the method of probabilistic consistency alignment and make it practical for the alignment of large genomic sequences. In so doing we develop a set of new technical methods, combined in a framework we term 'sequence progressive alignment', because it allows us to iteratively compute an alignment by passing over the input sequences from left to right. The result is that we massively decrease the memory consumption of the program relative to a naive implementation. The general engineering of the challenges faced in scaling such a computationally intensive process offer valuable lessons for planning related large-scale sequence analysis algorithms. We also further show the strong performance of Pecan using an extended analysis of ancient repeat alignments. Pecan is now one of the default alignment programs that has and is being used by a number of whole-genome comparative genomic projects. AVAILABILITY: The Pecan program is freely available at http://www.ebi.ac.uk/ approximately bjp/pecan/ Pecan whole genome alignments can be found in the Ensembl genome browser.

publication date

  • December 4, 2008

Research

keywords

  • Computational Biology
  • Proteins
  • Sequence Alignment
  • Sequence Analysis, Protein

Identity

Scopus Document Identifier

  • 59549102237

Digital Object Identifier (DOI)

  • 10.1093/bioinformatics/btn630

PubMed ID

  • 19056777

Additional Document Info

volume

  • 25

issue

  • 3