A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers - Archive ouverte HAL Access content directly
Journal Articles BMC Medical Research Methodology Year : 2021

A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers

(1, 2, 3, 4) , (3, 4, 1, 2) , (3, 4, 2, 5) , (1, 2) , (3, 4, 1, 2) , (6, 7, 8) , (3, 4, 1, 2) , (3, 4, 1, 2) , (3, 4, 1, 2) , (1, 2) , (6, 7, 8, 9, 10) , (3, 4, 1, 2) , (1, 2, 11, 7) , (1, 2) , , , , , , , , , , , , , , , , , , , , , , (12, 13) , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
1
2
3
4
5
6
7
8
9
10
11
12
13
Nadia Boutry-Kryza
  • Function : Author
  • PersonId : 920650
Alain Calender
  • Function : Author
  • PersonId : 1029278
Sophie Giraud
  • Function : Author
Mélanie Léone
  • Function : Author
Brigitte Bressac- de Paillerets
  • Function : Author
  • PersonId : 907568
Olivier Caron
Marine Guillaud-Bataille
  • Function : Author
  • PersonId : 1118853
Yves-Jean Bignon
  • Function : Author
  • PersonId : 1025776
Nancy Uhrhammer
  • Function : Author
Valérie Bonadona
  • Function : Author
  • PersonId : 912531
Christine Lasset
  • Function : Author
  • PersonId : 934726
Pascaline Berthet
  • Function : Author
  • PersonId : 920652
Laurent Castera
  • Function : Author
  • PersonId : 843162
Dominique Vaur
  • Function : Author
Violaine Bourdon
  • Function : Author
  • PersonId : 889847
Tetsuro Noguchi
  • Function : Author
  • PersonId : 889844
Cornel Popovici
  • Function : Author
  • PersonId : 937912
Audrey Remenieras
  • Function : Author
  • PersonId : 907525
Hagay Sobol
  • Function : Author
  • PersonId : 889850
Isabelle Coupier
  • Function : Author
  • PersonId : 907537
Pierre-Olivier Harmand
  • Function : Author
Paul Vilquin
  • Function : Author
  • PersonId : 1101728
Aurélie Dumont
  • Function : Author
Françoise Révillion
  • Function : Author
Danièle Muller
  • Function : Author
  • PersonId : 910038
Emmanuelle Barouk-Simonet
  • Function : Author
Françoise Bonnet
  • Function : Author
Virginie Bubien
  • Function : Author
Michel Longy
  • Function : Author
  • PersonId : 918895
Nicolas Sevenet
  • Function : Author
  • PersonId : 918893
Laurence Gladieff
  • Function : Author
  • PersonId : 898524
Rosine Guimbaud
  • Function : Author
  • PersonId : 955028
Viviane Feillel
  • Function : Author
Christine Toulas
  • Function : Author
  • PersonId : 882413
Hélène Dreyfus
  • Function : Author
  • PersonId : 912530
Dominique Leroux
  • Function : Author
  • PersonId : 907545
Magalie Peysselon
  • Function : Author
Christine Rebischung
  • Function : Author
Amandine Baurand
  • Function : Author
Geoffrey Bertolone
  • Function : Author
Fanny Coron
  • Function : Author
Laurence Faivre
  • Function : Author
  • PersonId : 856301
Vincent Goussot
  • Function : Author
Caroline Jacquot
  • Function : Author
Caroline Sawka
  • Function : Author
  • PersonId : 1073951
Caroline Kientz
  • Function : Author
  • PersonId : 920654
Marine Lebrun
  • Function : Author
  • PersonId : 1044965
Fabienne Prieur
  • Function : Author
  • PersonId : 922702
Sandra Fert-Ferrer
  • Function : Author
Véronique Mari
  • Function : Author
Laurence Venat-Bouvet
  • Function : Author
  • PersonId : 925269
Stéphane Bézieau
  • Function : Author
  • PersonId : 940317
Capucine Delnatte
  • Function : Author
  • PersonId : 910040
Isabelle Mortemousque
  • Function : Author
  • PersonId : 907551
Florence Coulet
  • Function : Author
  • PersonId : 920132
Florent Soubrier
  • Function : Author
  • PersonId : 945017
Mathilde Warcoin
  • Function : Author
Myriam Bronner
  • Function : Author
Sarab Lizard
  • Function : Author
Johanna Sokolowska
  • Function : Author
Marie-Agnès Collonge-Rame
  • Function : Author
  • PersonId : 909941
Alexandre Damette
  • Function : Author
Paul Gesta
  • Function : Author
  • PersonId : 912528
Hakima Lallaoui
  • Function : Author
Jean Chiesa
  • Function : Author
Denise Molina-Gomes
  • Function : Author
Olivier Ingster
  • Function : Author
Sylvie Manouvrier-Hanu
  • Function : Author
Sophie Lejeune
  • Function : Author
Pauline Pontois
  • Function : Author
Dominique Stoppa Lyonnet
  • Function : Author
Marion Gauthier-Villars
  • Function : Author
  • PersonId : 907333
Bruno Buecher
  • Function : Author
  • PersonId : 907332
Emmanuelle Mouret-Fourme
  • Function : Author
  • PersonId : 928988
Jean-Pierre Fricker
  • Function : Author
  • PersonId : 910039
Elisabeth Luporsi
  • Function : Author
  • PersonId : 928990
Marc Frenay
  • Function : Author
  • PersonId : 912527
Francois Eisinger
  • Function : Author
  • PersonId : 912525
Jessica Moretta
  • Function : Author
Catherine Dugast
  • Function : Author
  • PersonId : 907540
Chrystelle Colas
  • Function : Author
  • PersonId : 928992
Alain Lortholary
  • Function : Author
  • PersonId : 928991
Philippe Vennin
  • Function : Author
  • PersonId : 909939
Claude Adenis
  • Function : Author
Tan Dat Nguyen
  • Function : Author
Annick Rossi
  • Function : Author
Julie Tinat
  • Function : Author
Isabelle Tennevet
  • Function : Author
Jean-Marc Limacher
  • Function : Author
Christine Maugard
  • Function : Author
  • PersonId : 907549
Jean-Yves Bignon
  • Function : Author
Liliane Demange
  • Function : Author
  • PersonId : 912529
Odile Cohen-Haguenauer
  • Function : Author
  • PersonId : 1080689
Brigitte Gilbert
  • Function : Author
Hélène Zattara-Cannoni
  • Function : Author

Abstract

Background: Linking independent sources of data describing the same individuals enable innovative epidemiological and health studies but require a robust record linkage approach. We describe a hybrid record linkage process to link databases from two independent ongoing French national studies, GEMO (Genetic Modifiers of BRCA1 and BRCA2), which focuses on the identification of genetic factors modifying cancer risk of BRCA1 and BRCA2 mutation carriers, and GENEPSO (prospective cohort of BRCAx mutation carriers), which focuses on environmental and lifestyle risk factors. Methods: To identify as many as possible of the individuals participating in the two studies but not registered by a shared identifier, we combined probabilistic record linkage (PRL) and supervised machine learning (ML). This approach (named "PRL + ML") combined together the candidate matches identified by both approaches. We built the ML model using the gold standard on a first version of the two databases as a training dataset. This gold standard was obtained from PRL-derived matches verified by an exhaustive manual review. Results The Random Forest (RF) algorithm showed a highest recall (0.985) among six widely used ML algorithms: RF, Bagged trees, AdaBoost, Support Vector Machine, Neural Network. Therefore, RF was selected to build the ML model since our goal was to identify the maximum number of true matches. Our combined linkage PRL + ML showed a higher recall (range 0.988-0.992) than either PRL (range 0.916-0.991) or ML (0.981) alone. It identified 1995 individuals participating in both GEMO (6375 participants) and GENEPSO (4925 participants). Conclusions: Our hybrid linkage process represents an efficient tool for linking GEMO and GENEPSO. It may be generalizable to other epidemiological studies involving other databases and registries.
Fichier principal
Vignette du fichier
s12874-021-01299-6.pdf (761.64 Ko) Télécharger le fichier
Origin : Publisher files allowed on an open archive

Dates and versions

inserm-03313811 , version 1 (04-08-2021)

Licence

Attribution - CC BY 4.0

Identifiers

Cite

Yue Jiao, Fabienne Lesueur, Chloé-Agathe Azencott, Maïté Laurent, Noura Mebirouk, et al.. A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers. BMC Medical Research Methodology, 2021, 21 (1), pp.155. ⟨10.1186/s12874-021-01299-6⟩. ⟨inserm-03313811⟩
127 View
52 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More