Secondary structure-based assignment of the protein structural classes.
Academic Article
Overview
abstract
Structural class categorizes proteins based on the amount and arrangement of the constituent secondary structures. The knowledge of structural classes is applied in numerous important predictive tasks that address structural and functional features of proteins. We propose novel structural class assignment methods that use one-dimensional (1D) secondary structure as the input. The methods are designed based on a large set of low-identity sequences for which secondary structure is predicted from their sequence (PSSA(sc) model) or assigned based on their tertiary structure (SSA(sc)). The secondary structure is encoded using a comprehensive set of features describing count, content, and size of secondary structure segments, which are fed into a small decision tree that uses ten features to perform the assignment. The proposed models were compared against seven secondary structure-based and ten sequence-based structural class predictors. Using the 1D secondary structure, SSA(sc) and PSSA(sc) can assign proteins to the four main structural classes, while the existing secondary structure-based assignment methods can predict only three classes. Empirical evaluation shows that the proposed models are quite promising. Using the structure-based assignment performed in SCOP (structural classification of proteins) as the golden standard, the accuracy of SSA(sc) and PSSA(sc) equals 76 and 75%, respectively. We show that the use of the secondary structure predicted from the sequence as an input does not have a detrimental effect on the quality of structural class assignment when compared with using secondary structure derived from tertiary structure. Therefore, PSSA(sc) can be used to perform the automated assignment of structural classes based on the sequences.