Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB) - Archive ouverte HAL Access content directly
Journal Articles International Journal of Molecular Sciences Year : 2020

Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)

(1, 2, 3, 4, 5) , (3) , (1, 2, 5)
1
2
3
4
5

Abstract

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein-protein, protein-DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.
Fichier principal
Vignette du fichier
ijms-21-02243.pdf (2.89 Mo) Télécharger le fichier
Vignette du fichier
Appendix A.pdf (385.95 Ko) Télécharger le fichier
Origin : Publisher files allowed on an open archive

Dates and versions

inserm-02907370 , version 1 (29-07-2020)

Identifiers

Cite

Nicolas K Shinada, Peter Schmidtke, Alexandre de Brevern. Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB). International Journal of Molecular Sciences, 2020, 21 (6), pp.2243. ⟨10.3390/ijms21062243⟩. ⟨inserm-02907370⟩
65 View
41 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More