A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i). - Inserm - Institut national de la santé et de la recherche médicale Accéder directement au contenu
Article Dans Une Revue BMC Bioinformatics Année : 2014

A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i).

Résumé

BACKGROUND: Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. RESULTS: In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. CONCLUSIONS: We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease.
Fichier principal
Vignette du fichier
1471-2105-15-111.pdf (1.18 Mo) Télécharger le fichier
1471-2105-15-111-S1.XLSX (78.78 Ko) Télécharger le fichier
1471-2105-15-111-S2.XLSX (19.26 Ko) Télécharger le fichier
1471-2105-15-111-S3.XLSX (10.89 Ko) Télécharger le fichier
1471-2105-15-111-S4.XLSX (14.59 Ko) Télécharger le fichier
1471-2105-15-111-S5.XLSX (17.08 Ko) Télécharger le fichier
1471-2105-15-111.xml (155.86 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Format : Autre
Format : Autre
Format : Autre
Format : Autre
Format : Autre
Format : Autre
Loading...

Dates et versions

inserm-00988598 , version 1 (08-05-2014)

Identifiants

Citer

Carlos Bermejo-Das-Neves, Hoan-Ngoc Nguyen, Olivier Poch, Julie Thompson. A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i).. BMC Bioinformatics, 2014, 15 (1), pp.111. ⟨10.1186/1471-2105-15-111⟩. ⟨inserm-00988598⟩
837 Consultations
401 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More