Clustering of protein domains for functional and evolutionary studies.

Abstract : BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. RESULTS: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. CONCLUSION: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.
Liste complète des métadonnées

Littérature citée [38 références]  Voir  Masquer  Télécharger

http://www.hal.inserm.fr/inserm-00663929
Contributeur : Ed. Bmc <>
Soumis le : vendredi 27 janvier 2012 - 17:23:02
Dernière modification le : mardi 11 octobre 2016 - 13:56:35
Document(s) archivé(s) le : mercredi 14 décembre 2016 - 02:22:36

Identifiants

Collections

Citation

Pavle Goldstein, Jurica Zucko, Dušica Vujaklija, Anita Kriško, Daslav Hranueli, et al.. Clustering of protein domains for functional and evolutionary studies.. BMC Bioinformatics, BioMed Central, 2009, 10 (1), pp.335. 〈10.1186/1471-2105-10-335〉. 〈inserm-00663929〉

Partager

Métriques

Consultations de la notice

144

Téléchargements de fichiers

271