A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i). - Archive ouverte HAL Access content directly
Journal Articles BMC Bioinformatics Year : 2014

A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i).

(1) , (2) , (1) , (1)
1
2

Abstract

BACKGROUND: Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. RESULTS: In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. CONCLUSIONS: We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease.
Fichier principal
Vignette du fichier
1471-2105-15-111.pdf (1.18 Mo) Télécharger le fichier
Vignette du fichier
1471-2105-15-111-S1.XLSX (78.78 Ko) Télécharger le fichier
Vignette du fichier
1471-2105-15-111-S2.XLSX (19.26 Ko) Télécharger le fichier
Vignette du fichier
1471-2105-15-111-S3.XLSX (10.89 Ko) Télécharger le fichier
Vignette du fichier
1471-2105-15-111-S4.XLSX (14.59 Ko) Télécharger le fichier
Vignette du fichier
1471-2105-15-111-S5.XLSX (17.08 Ko) Télécharger le fichier
Vignette du fichier
1471-2105-15-111.xml (155.86 Ko) Télécharger le fichier
Origin : Publisher files allowed on an open archive
Format : Other
Format : Other
Format : Other
Format : Other
Format : Other
Format : Other
Loading...

Dates and versions

inserm-00988598 , version 1 (08-05-2014)

Identifiers

Cite

Carlos Bermejo-Das-Neves, Hoan-Ngoc Nguyen, Olivier Poch, Julie Thompson. A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i).. BMC Bioinformatics, 2014, 15 (1), pp.111. ⟨10.1186/1471-2105-15-111⟩. ⟨inserm-00988598⟩
827 View
378 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More