Clustering of protein domains for functional and evolutionary studies.

Pavle Goldstein; Jurica Zucko; Dušica Vujaklija; Anita Kriško; Daslav Hranueli; Paul Long; Catherine Etchebest; Bojan Basrak; John Cullum

doi:10.1186/1471-2105-10-335

Article Dans Une Revue BMC Bioinformatics Année : 2009

Clustering of protein domains for functional and evolutionary studies.

(1) , (2, 3) , (4) , (5, 6) , (3) , (7) , (8) , (1) , (2)

1
2
3
4
5
6
7
8

Pavle Goldstein

Fonction : Auteur
PersonId : 919273

Department of Mathematics [Zagreb]

Jurica Zucko

Fonction : Auteur
PersonId : 919274

Department of Genetics

Faculty of Food Technology & Biotechnology [Zagreb]

Dušica Vujaklija

Fonction : Auteur
PersonId : 919275

Department of Molecular Biology

Anita Kriško

Fonction : Auteur
PersonId : 919276

Génétique moléculaire, évolutive et médicale

Mediterranean Institute for Life Sciences

Daslav Hranueli

Fonction : Auteur
PersonId : 919277

Faculty of Food Technology & Biotechnology [Zagreb]

Paul Long

Fonction : Auteur
PersonId : 919278

The School of Pharmacy

Catherine Etchebest

Fonction : Auteur
PersonId : 856817

Bioinformatique génomique et moléculaire

Bojan Basrak

Fonction : Auteur
PersonId : 919279

Department of Mathematics [Zagreb]

John Cullum

Fonction : Auteur correspondant
PersonId : 919280

Connectez-vous pour contacter l'auteur

Department of Genetics

Résumé

BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. RESULTS: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. CONCLUSION: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.

Domaines

Génomique, Transcriptomique et Protéomique [q-bio.GN] Bio-Informatique, Biologie Systémique [q-bio.QM] Bio-informatique [q-bio.QM]

Fichier principal

1471-2105-10-335.pdf (623.7 Ko)

1471-2105-10-335-S1.TXT (851.57 Ko)

1471-2105-10-335-S2.XLS (297 Ko)

1471-2105-10-335.xml (109.68 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Format : Autre

Ed. BMC : Connectez-vous pour contacter le contributeur

https://inserm.hal.science/inserm-00663929

Soumis le : vendredi 27 janvier 2012-17:23:02

Dernière modification le : samedi 25 juin 2022-20:51:29

Archivage à long terme le : mercredi 14 décembre 2016-02:22:36

Dates et versions

inserm-00663929 , version 1 (27-01-2012)

Identifiants

HAL Id : inserm-00663929 , version 1
DOI : 10.1186/1471-2105-10-335
PUBMED : 19832975

Citer

Pavle Goldstein, Jurica Zucko, Dušica Vujaklija, Anita Kriško, Daslav Hranueli, et al.. Clustering of protein domains for functional and evolutionary studies.. BMC Bioinformatics, 2009, 10 (1), pp.335. ⟨10.1186/1471-2105-10-335⟩. ⟨inserm-00663929⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM UNIV-PARIS7

87 Consultations

279 Téléchargements

Clustering of protein domains for functional and evolutionary studies.

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager