Skip to Main content Skip to Navigation
Journal articles

Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB)

Abstract : The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein-protein, protein-DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.
Complete list of metadatas
Contributor : Alexandre G. de Brevern <>
Submitted on : Wednesday, July 29, 2020 - 12:02:23 PM
Last modification on : Monday, December 14, 2020 - 3:44:48 PM
Long-term archiving on: : Tuesday, December 1, 2020 - 7:42:01 AM





Nicolas Shinada, Peter Schmidtke, Alexandre de Brevern. Accurate Representation of Protein-Ligand Structural Diversity in the Protein Data Bank (PDB). International Journal of Molecular Sciences, MDPI, 2020, 21 (6), pp.2243. ⟨10.3390/ijms21062243⟩. ⟨inserm-02907370⟩



Record views


Files downloads