Comparison of Crosslingual Similarity Measures for Multilingual Documents Clustering - Institut National des Sciences Appliquées de Strasbourg Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Comparison of Crosslingual Similarity Measures for Multilingual Documents Clustering

Résumé

This paper compares the performance of one thesaurus-based approach against three lexicon-based techniques to the measurement of the crosslingual similarity of domain-specific texts. These methods are applied to an unstructured manually annotated corpus of texts in three languages: French, English and German. We investigate the correlation between these measures and human judgement as well as their ability to detect subtle (in the same topic) and broader (in related topics) differences in comparability. Additional experiments aim to determine the extent to which terminology helps improving measures of similarity in a specialised domain. Results suggest that injecting domain-specific knowledge, when available, is a good alternative to more shallow techniques.
Fichier non déposé

Dates et versions

hal-01100641 , version 1 (06-01-2015)

Identifiants

  • HAL Id : hal-01100641 , version 1

Citer

Manuela Yapomo, Delphine Bernhard, Pierre Gançarski. Comparison of Crosslingual Similarity Measures for Multilingual Documents Clustering. 12ème atelier Fouille de Données Complexes (FDC2015), Jan 2015, Luxembourg, Luxembourg. pp.55-66. ⟨hal-01100641⟩
98 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More