De la classification à la classification croisée : une approche basée sur la modélisation

Christine Keribin 1, 2
1 CELESTE - Statistique mathématique et apprentissage
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay
Abstract : This habilitation thesis retraces works focusing mainly on model based clustering and the related issue of model choice. After recalling the contribution of mixture models to the unsupervised framework of clustering, the latent block model (LBM), a mixture model extended to the simultaneous clustering (co-clustering) of rows and columns of a data table, is introduced. Theoretical (identifiability, consistency and asymptotic normality of the estimators) and methodological contributions (estimation by variational EM, stochastic EM, Bayesian variational EM, Gibbs sampler, model choice with the ICL criterion) are presented. The LBM is extended to the Multiple Latent Block Model (MLBM) to process individual data in pharmacovigilance and a greedy algorithm to scan the model set is proposed. The study of functional MRI data, for which the number of individuals is much smaller than the number of variables, made it possible to explore the large dimension paradigm in two directions : use of Bayesian inference as a regularization tool (MSBR model - Multi Sparse Bayesian Regression) ; drastic dimension reduction while keeping interpretable results (clustering of spatially constrained variables supervised by the prediction of the target). Finally, some contributions in less related domains (data modeling in genomics, meteorology, phylogenetics or finance) illustrate how applications bring up interesting theoretical or methodological issues.
Complete list of metadatas

Cited literature [144 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02397429
Contributor : Christine Keribin <>
Submitted on : Friday, December 6, 2019 - 3:12:22 PM
Last modification on : Monday, January 13, 2020 - 1:59:28 PM

File

KERIBIN-HDRParisSud - TEL.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-02397429, version 1

Citation

Christine Keribin. De la classification à la classification croisée : une approche basée sur la modélisation. Statistiques [math.ST]. Université Paris Sud XI, 2019. ⟨tel-02397429⟩

Share

Metrics

Record views

97

Files downloads

85