Determination of protein folding kinetic types using sequence and predicted secondary structure and solvent accessibility. Academic Article uri icon

Overview

abstract

  • Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.

publication date

  • November 17, 2010

Research

keywords

  • Proteins
  • Sequence Analysis, Protein

Identity

Scopus Document Identifier

  • 84860256769

Digital Object Identifier (DOI)

  • 10.1007/s00726-010-0805-y

PubMed ID

  • 21082205

Additional Document Info

volume

  • 42

issue

  • 1