Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings.

Abstract : Background:Natural Language Processing (NLP) has been shown effective to analyze the content of radiologyreports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learningto detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography andvenography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts,modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. Acorpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physicianfor relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by aphysician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decisionmodels accounted for the imbalanced nature of the data and exploited the structure of the reports.Results:The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep veinthrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improvedperformances in all cases.Conclusions:This study demonstrates the benefits of developing an automated method to identify medical concepts,modality and relations from radiology reports in French. An end-to-end automatic system for annotationand classification which could be applied to other radiology reports databases would be valuable for epidemiologicalsurveillance, performance monitoring, and accreditation in French hospitals.
Type de document :
Article dans une revue
BMC Bioinformatics, BioMed Central, 2013, pp.266
Liste complète des métadonnées

Littérature citée [44 références]  Voir  Masquer  Télécharger
Contributeur : Nadia Taibi <>
Soumis le : jeudi 11 décembre 2014 - 17:24:09
Dernière modification le : jeudi 12 juillet 2018 - 13:20:04
Document(s) archivé(s) le : samedi 15 avril 2017 - 07:56:58


Fichiers éditeurs autorisés sur une archive ouverte


  • HAL Id : inserm-01094167, version 1
  • PUBMED : 25099227



Anne-Dominique Pham, Aurélie Névéol, Thomas Lavergne, Daisuke Yasunaga, Olivier Clément, et al.. Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings.. BMC Bioinformatics, BioMed Central, 2013, pp.266. 〈inserm-01094167〉



Consultations de la notice


Téléchargements de fichiers