Annotation of bacterial genomes using improved phylogenomic profiles. Academic Article uri icon

Overview

abstract

  • MOTIVATION: Phylogenomic profiling is a large-scale comparative genomic method used to infer protein function from evolutionary information first described in a binary form by Pellegrini et al. (1999). Here, we propose improvements of this approach including the use of normalized Blastp bit scores, a normalization of the matrix of profiles to take into account the evolutionary distances between bacteria, the definition of a phylogenomic neighborhood based on continuous pairwise distances between genes and an original annotation procedure including the computation of a p-value for each functional assignment. RESULTS: The method presented here increases the number of Ecocyc enzymes identified as being evolutionarily related by about 25% with respect to the original binary form (absent/present) method. The fraction of 'false' positives is shown to be smaller than 20%. Based on their phylogenomic relationships, genes of unknown function can then be automatically related to annotated genes. Each gene annotation predicted is associated with a p-value, i.e. its probability to be obtained by chance. The validity of this method was extensively tested on a large set of genes of known function using the MultiFun database. We find that 50% of 3122 function attributions that can be made at a p-value level of 10(-11) correspond to the actual gene annotation. The method can be readily applied to any newly sequenced microbial genome. In contrast to earlier work on the same topic, our approach avoids the use of arbitrary cut-off values, and provides a reliability estimate of the functional predictions in form of p-values.

publication date

  • January 1, 2003

Research

keywords

  • Algorithms
  • Chromosome Mapping
  • Documentation
  • Evolution, Molecular
  • Gene Expression Profiling
  • Genome, Bacterial
  • Proteome

Identity

Scopus Document Identifier

  • 3242888755

Digital Object Identifier (DOI)

  • 10.1093/bioinformatics/btg1013

PubMed ID

  • 12855445

Additional Document Info

volume

  • 19 Suppl 1