A Note on Lazy Training in Supervised Differentiable Programming - INRIA - Institut National de Recherche en Informatique et en Automatique Access content directly
Preprints, Working Papers, ... Year : 2018

A Note on Lazy Training in Supervised Differentiable Programming

Abstract

In a series of recent theoretical work, it has been shown that strongly over-parameterized neural networks trained with gradient-based methods could converge linearly to zero training loss, with their parameters hardly varying. In this note, our goal is to exhibit the simple structure that is behind these results. In a simplified setting, we prove that ``lazy training'' essentially solves a kernel regression. We also show that this behavior is not so much due to over-parameterization than to a choice of scaling, often implicit, that allows to linearize the model around its initialization. These theoretical results complemented with simple numerical experiments make it seem unlikely that ``lazy training'' is behind the many successes of neural networks in high dimensional tasks.
Fichier principal
Vignette du fichier
chizatbach2018lazy.pdf (759.19 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01945578 , version 1 (05-12-2018)
hal-01945578 , version 2 (11-12-2018)
hal-01945578 , version 3 (21-02-2019)
hal-01945578 , version 4 (08-06-2019)
hal-01945578 , version 5 (18-06-2019)
hal-01945578 , version 6 (07-01-2020)

Identifiers

  • HAL Id : hal-01945578 , version 1

Cite

Lenaic Chizat, Francis Bach. A Note on Lazy Training in Supervised Differentiable Programming. 2018. ⟨hal-01945578v1⟩
5441 View
4495 Download

Share

Gmail Facebook X LinkedIn More