A Transformer-based Nlp Pipeline for Enhanced Extraction of Botanical Information Using Camembert on French Literature - Unité de modélisation mathématique et informatique des systèmes complexes Accéder directement au contenu
Communication Dans Un Congrès Année : 2024

A Transformer-based Nlp Pipeline for Enhanced Extraction of Botanical Information Using Camembert on French Literature

Résumé

This research investigates the untapped wealth of centuries-old French botanical literature, particularly focused on floras, which are comprehensive guides detailing plant species in specific regions. Despite their significance, this literature remains largely unexplored in the context of AI integration. Our objective is to bridge this gap by constructing a specialized botanical French dataset sourced from the flora of New Caledonia. We propose a transformer-based Named Entity Recognition pipeline, leveraging distant supervision and CamemBERT, for the automated extraction and structuring of botanical information. The results demonstrate exceptional performance: for species names extraction, the NER model achieves precision (0.94), recall (0.98), and F1-score (0.96), while for fine-grained extraction of botanical morphological terms, the CamemBERT-based NER model attains precision (0.93), recall (0.96), and F1-score (0.94). This work contributes to the exploration of valuable botanical literature by underscoring the capability of AI models to automate information extraction from complex and diverse texts.
Fichier principal
Vignette du fichier
csit140605.pdf (1.58 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04536866 , version 1 (08-04-2024)

Identifiants

Citer

Ayoub Nainia, Régine Vignes-Lebbe, Eric Chenin, Maya Sahraoui, Hajar Mousannif, et al.. A Transformer-based Nlp Pipeline for Enhanced Extraction of Botanical Information Using Camembert on French Literature. 5th International Conference on NLP & Information Retrieval (NLPI 2024), Mar 2024, Sydney (AUSTRALIA), Australia. pp.59-78, ⟨10.5121/csit.2024.140605⟩. ⟨hal-04536866⟩
0 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More