S. Richard, A. G. Sutton, and . Barto, Toward a modern theory of adaptive networks : expectation and prediction, Psychological review, vol.88, issue.2, p.135, 1981.

W. Schultz, Getting formal with dopamine and reward, Neuron, vol.36, issue.2, pp.241-263, 2002.

D. Nathaniel, . Daw, P. John, P. O'doherty, B. Dayan et al., Cortical substrates for exploratory decisions in humans, Nature, vol.441, issue.7095, p.876, 2006.

D. Lee, L. Michelle, . Conroy, P. Benjamin, D. J. Mc-greevy et al., Reinforcement learning and decision making in monkeys during a competitive game, Cognitive Brain Research, vol.22, issue.1, pp.45-58, 2004.

G. R. Dougal, M. Tervo, M. Proskurin, M. Manakov, A. Kabra et al., Behavioral variability through stochastic choice and its gating by anterior cingulate cortex, Cell, vol.159, issue.1, pp.21-32, 2014.

M. Belkaid, E. Bousseyrol, R. D. Cuttoli, M. Dongelmans, E. K. Duranté et al., Alexandre Mourot, Jérémie Naudé, Olivier Sigaud, and Philippe Faure. Mice adaptively generate choice variability in a deterministic task, 2019.

A. Lempel and J. Ziv, On the complexity of finite sequences, IEEE Transactions on information theory, vol.22, issue.1, pp.75-81, 1976.

A. Robert, A. Rescorla, and . Wagner, A theory of pavlovian conditioning : The effectiveness of reinforcement and non-reinforcement, Classical conditioning II : Current research and theory, 1972.

J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, Journal of Machine Learning Research, vol.13, pp.281-305, 2012.