Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C–O Couplings - Inserm - Institut national de la santé et de la recherche médicale Accéder directement au contenu
Article Dans Une Revue Journal of the American Chemical Society Année : 2022

Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C–O Couplings

Résumé

Synthetic yield prediction using machine learning is intensively studied. Previous work focused on two categories of datasets: High-Throughput Experimentation data, as an ideal case study and datasets extracted from proprietary databases, which are known to have a strong reporting bias towards high yields. However, predicting yields using published reaction data remains elusive. To fill the gap, we built a dataset on 1 nickel-catalyzed cross-couplings extracted from organic reaction publications, including scope and optimization information. We demonstrate the importance of including optimization data as a source of failed experiments and emphasize how publication constraints shape the exploration of the chemical space by the synthetic community. While machine learning models still fail to perform out-of-sample predictions, this work shows that adding chemical knowledge enables fair predictions in a low-data regime. Eventually, we hope that this unique public database will foster further improvements of machine learning methods for reaction yield prediction in a more realistic context.
Fichier principal
Vignette du fichier
Predicting_reaction_yields___JACS_Format_changes.pdf (11.47 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03790865 , version 1 (28-09-2022)

Identifiants

Citer

Jules Schleinitz, Maxime Langevin, Yanis Smail, Benjamin Wehnert, Laurence Grimaud, et al.. Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C–O Couplings. Journal of the American Chemical Society, 2022, 144 (32), pp.14722-14730. ⟨10.1021/jacs.2c05302⟩. ⟨hal-03790865⟩
80 Consultations
263 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More