Structural Alphabet: From a Local Point of View to a Global Description of Protein 3D Structures
Résumé
The study of protein structures' local conformations has a long history principally based on the analysis of the classical repetitive structures (i.e. alpha-helix and beta-sheet), and also on the characterization of some particular structures in the coil state (e.g. turns). The secondary structures are interesting for describing the global protein fold but miss all the orientations of the connecting regions and so neglect many particularities of the coil state. In order to take these structural features into account, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive residues, called Protein Blocks (PBs). Conversely to the secondary structures, the PBs are able to approximate every part of the protein structures. These PBs have been used both to describe precisely the 3D protein backbones with an average rmsd of 0.42 A, and to perform a local structure prediction with a rate of correct prediction of 48.7%. In this chapter, we present the interest of the Protein Blocks by comparing the secondary structure assignment with the assignment in terms of PBs. We highlight the discrepancies between different secondary structure assignment methods and show some interesting correspondence between particular local folds and the Protein Blocks. Then, we use the Protein Block prediction to classify proteins into the classical structural classes, namely all , all and mixed. The prediction rate of theses different classes is good, i.e. 71.5%, with no confusion between all and all classes. Finally, we present a new approach named TopKAPi that stands for "Triangular Kohonen Map for Analyzing Proteins". It enables to classify and analyze proteins according to their Protein Block frequencies using for this purpose a novel unsupervised clustering method: a triangular self-organizing Kohonen map. This method enables to determine new relationships between local structures and amino acid distributions. This new methodology could be of great interest in proteomics and sequence alignment.
Loading...