Skip to Main content Skip to Navigation
Journal articles

Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings.

Abstract : Background:Natural Language Processing (NLP) has been shown effective to analyze the content of radiologyreports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learningto detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography andvenography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts,modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. Acorpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physicianfor relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by aphysician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decisionmodels accounted for the imbalanced nature of the data and exploited the structure of the reports.Results:The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep veinthrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improvedperformances in all cases.Conclusions:This study demonstrates the benefits of developing an automated method to identify medical concepts,modality and relations from radiology reports in French. An end-to-end automatic system for annotationand classification which could be applied to other radiology reports databases would be valuable for epidemiologicalsurveillance, performance monitoring, and accreditation in French hospitals.
Document type :
Journal articles
Complete list of metadata

Cited literature [44 references]  Display  Hide  Download
Contributor : Nadia Taibi Connect in order to contact the contributor
Submitted on : Thursday, December 11, 2014 - 5:24:09 PM
Last modification on : Sunday, June 26, 2022 - 12:02:24 PM
Long-term archiving on: : Saturday, April 15, 2017 - 7:56:58 AM


Publisher files allowed on an open archive


  • HAL Id : inserm-01094167, version 1
  • PUBMED : 25099227


Anne-Dominique Pham, Aurélie Névéol, Thomas Lavergne, Daisuke Yasunaga, Olivier Clément, et al.. Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings.. BMC Bioinformatics, BioMed Central, 2013, pp.266. ⟨inserm-01094167⟩



Record views


Files downloads