Skip to Main content Skip to Navigation
Journal articles

Clustering of protein domains for functional and evolutionary studies.

Abstract : BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. RESULTS: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. CONCLUSION: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.
Complete list of metadata

Cited literature [38 references]  Display  Hide  Download
Contributor : Ed. BMC Connect in order to contact the contributor
Submitted on : Friday, January 27, 2012 - 5:23:02 PM
Last modification on : Saturday, June 25, 2022 - 8:51:29 PM
Long-term archiving on: : Wednesday, December 14, 2016 - 2:22:36 AM



Pavle Goldstein, Jurica Zucko, Dušica Vujaklija, Anita Kriško, Daslav Hranueli, et al.. Clustering of protein domains for functional and evolutionary studies.. BMC Bioinformatics, BioMed Central, 2009, 10 (1), pp.335. ⟨10.1186/1471-2105-10-335⟩. ⟨inserm-00663929⟩



Record views


Files downloads