Balancing exploration and exploitation with information and randomization.

Published on Apr 1, 2021in Current opinion in behavioral sciences3.99
· DOI :10.1016/J.COBEHA.2020.10.001
Robert C. Wilson20
Estimated H-index: 20
(UA: University of Arizona),
Elizabeth Bonawitz19
Estimated H-index: 19
(RU: Rutgers University)
+ 1 AuthorsR. Becket Ebitz10
Estimated H-index: 10
(UdeM: Université de Montréal)
Sources
Abstract
Explore-exploit decisions require us to trade off the benefits of exploring unknown options to learn more about them, with exploiting known options, for immediate reward. Such decisions are ubiquitous in nature, but from a computational perspective, they are notoriously hard. There is therefore much interest in how humans and animals make these decisions and recently there has been an explosion of research in this area. Here we provide a biased and incomplete snapshot of this field focusing on the major finding that many organisms use two distinct strategies to solve the explore-exploit dilemma: a bias for information (‘directed exploration’) and the randomization of choice (‘random exploration’). We review evidence for the existence of these strategies, their computational properties, their neural implementations, as well as how directed and random exploration vary over the lifespan. We conclude by highlighting open questions in this field that are ripe to both explore and exploit.
📖 Papers frequently viewed together
3 Citations
2 Citations
3 Citations
References98
Newest
#1Björn MederH-Index: 14
#2Charley M. WuH-Index: 7
Last. Azzurra RuggeriH-Index: 9
view all 4 authors...
Are young children just random explorers who learn serendipitously? Or are even young children guided by uncertainty-directed sampling, seeking to explore in a systematic fashion? We study how children between the ages of 4 and 9 search in an explore-exploit task with spatially correlated rewards, where exhaustive exploration is infeasible and not all options can be experienced. By combining behavioral data with a computational model that decomposes search into similarity-based generalization, u...
2 CitationsSource
#1James A. Waltz (UMB: University of Maryland, Baltimore)H-Index: 30
#2Robert C. Wilson (UA: University of Arizona)H-Index: 20
Last. James M. Gold (UMB: University of Maryland, Baltimore)H-Index: 102
view all 5 authors...
Schizophrenia is associated with a number of deficits in decision-making, but the scope, nature, and cause of these deficits are not completely understood. Here we focus on a particular type of dec...
2 CitationsSource
#1Karima Chakroun (UHH: University of Hamburg)H-Index: 5
#2David Mathar (University of Cologne)H-Index: 10
Last. Jan Peters (UHH: University of Hamburg)H-Index: 86
view all 5 authors...
Involvement of dopamine in regulating exploration during decision-making has long been hypothesized, but direct causal evidence in humans is still lacking. Here, we use a combination of computational modeling, pharmacological intervention and functional magnetic resonance imaging to address this issue. Thirty-one healthy male participants performed a restless four-armed bandit task in a within-subjects design under three drug conditions: 150 mg of the dopamine precursor L-dopa, 2 mg of the D2 re...
20 CitationsSource
#1Siddhartha Joshi (UPenn: University of Pennsylvania)H-Index: 6
#2Joshua I. Gold (UPenn: University of Pennsylvania)H-Index: 34
Cognitively driven pupil modulations reflect certain underlying brain functions. What do these reflections tell us? Here, we review findings that have identified key roles for three neural systems: cortical modulation of the pretectal olivary nucleus (PON), which controls the pupillary light reflex; the superior colliculus (SC), which mediates orienting responses, including pupil changes to salient stimuli; and the locus coeruleus (LC)-norepinephrine (NE) neuromodulatory system, which mediates r...
62 CitationsSource
#1Brian J. Jackson (UW: University of Washington)H-Index: 1
#2Gusti Lulu Fatima (UW: University of Washington)H-Index: 1
Last. David H. Gire (UW: University of Washington)H-Index: 12
view all 4 authors...
: During self-guided behaviors animals identify constraints of the problems they face and adaptively employ appropriate strategies (Marsh, 2002). In the case of foraging, animals must balance sensory-guided exploration of an environment with memory- guided exploitation of known resource locations. Here we show that animals adaptively shift cognitive resources between sensory and memory systems during foraging to optimize route planning under uncertainty. We demonstrate this using a new, laborato...
6 CitationsSource
#1Momchil S. Tomov (Harvard University)H-Index: 4
#2Van Q. Truong (Harvard University)H-Index: 1
Last. Samuel J. Gershman (Harvard University)H-Index: 56
view all 4 authors...
Most real-world decisions involve a delicate balance between exploring unfamiliar alternatives and committing to the best known option. Previous work has shown that humans rely on different forms of uncertainty to negotiate this "explore-exploit” trade-off, yet the neural basis of the underlying computations remains unclear. Using fMRI (n = 31), we find that relative uncertainty is represented in right rostrolateral prefrontal cortex and drives directed exploration, while total uncertainty is re...
15 CitationsSource
#1Joshua F. Dean (University of Liverpool)H-Index: 11
#2Ove H. Meisel (VU: VU University Amsterdam)H-Index: 4
Last. A. Johannes Dolman (VU: VU University Amsterdam)H-Index: 18
view all 18 authors...
Inland waters (rivers, lakes and ponds) are important conduits for the emission of terrestrial carbon in Arctic permafrost landscapes. These emissions are driven by turnover of contemporary terrestrial carbon and additional pre-aged (Holocene and late-Pleistocene) carbon released from thawing permafrost soils, but the magnitude of these source contributions to total inland water carbon fluxes remains unknown. Here we present unique simultaneous radiocarbon age measurements of inland water CO2, C...
84 CitationsSource
#1Vincent D. Costa (OHSU: Oregon Health & Science University)H-Index: 21
#2Bruno B. Averbeck (NIH: National Institutes of Health)H-Index: 56
Reinforcement learning (RL) refers to the behavioral process of learning to obtain reward and avoid punishment. An important component of RL is the explore-exploit trade-off, which refers to the problem of choosing between exploiting options with known values and exploring unfamiliar options. We examined correlates of this trade-off, as well as other RL related variables, in orbito-frontal cortex (OFC), while three male monkeys carried out a 3-armed bandit learning task. During the task, novel c...
21 CitationsSource
#1Robert C. WilsonH-Index: 20
#2Siyu WangH-Index: 2
Last. Jonathan D. CohenH-Index: 147
view all 4 authors...
5 CitationsSource
#1Irene Cogliati Dezza (UCL: University College London)H-Index: 2
#2Xavier Noël (ULB: Université libre de Bruxelles)H-Index: 42
Last. Angela J. Yu (UCSD: University of California, San Diego)H-Index: 21
view all 4 authors...
Information-seeking is an important aspect of human cognition. Despite its adaptive role, we have rather limited understanding on the mechanisms that subtend information-seeking in healthy individuals and in psychopathological populations. Here, we aim to formalize the computational basis of healthy human information behavior, as well as how those components may be compromised in behavioral addiction. We focus on gambling disorder, a form of addiction without the confound of substance consumptio...
1 CitationsSource
Cited By21
Newest
#1R. Nathan Spreng (Montreal Neurological Institute and Hospital)H-Index: 41
#2Gary R. Turner (York University)H-Index: 25
Changes in cognition, affect, and brain function combine to promote a shift in the nature of mentation in older adulthood, favoring exploitation of prior knowledge over exploratory search as the starting point for thought and action. Age-related exploitation biases result from the accumulation of prior knowledge, reduced cognitive control, and a shift toward affective goals. These are accompanied by changes in cortical networks, as well as attention and reward circuits. By incorporating these fa...
Source
#1Anne G.E. Collins (HWNI: Helen Wills Neuroscience Institute)H-Index: 21
#2Amitai Shenhav (Brown University)H-Index: 19
An organism's survival depends on its ability to learn about its environment and to make adaptive decisions in the service of achieving the best possible outcomes in that environment. To study the neural circuits that support these functions, researchers have increasingly relied on models that formalize the computations required to carry them out. Here, we review the recent history of computational modeling of learning and decision-making, and how these models have been used to advance understan...
1 CitationsSource
#1R. Becket Ebitz (UdeM: Université de Montréal)H-Index: 10
#2Benjamin Y. Hayden (UMN: University of Minnesota)H-Index: 50
A major shift is happening within neurophysiology: a population doctrine is drawing level with the single-neuron doctrine that has long dominated the field. Population-level ideas have so far had their greatest impact in motor neuroscience, but they hold great promise for resolving open questions in cognition as well. Here, we codify the population doctrine and survey recent work that leverages this view to specifically probe cognition. Our discussion is organized around five core concepts that ...
3 CitationsSource
#1William H. Barnett (IUPUI: Indiana University – Purdue University Indianapolis)
#2Alexey Kuznetsov (IUPUI: Indiana University – Purdue University Indianapolis)H-Index: 12
Last. Christopher C. Lapish (IUPUI: Indiana University – Purdue University Indianapolis)H-Index: 17
view all 3 authors...
Cortical and basal ganglia circuits play a crucial role in the formation of goal-directed and habitual behaviors. In this study, we investigate the cortico-striatal circuitry involved in learning and how cortical interactions with specific striatal subregions are involved in the emergence of inflexible behaviors such those observed in addiction. Specifically, we develop a computational model of cortico-striatal interactions that performs concurrent goal-directed and stimulus-response learning. T...
Source
#1Charley M. WuH-Index: 7
#2Eric Schulz (MPG: Max Planck Society)H-Index: 15
Last. Maarten SpeekenbrinkH-Index: 20
view all 4 authors...
Source
#1Dalin Guo (UCSD: University of California, San Diego)H-Index: 1
#2Angela J. Yu (UCSD: University of California, San Diego)H-Index: 21
Humans are often faced with an exploration-versus-exploitation trade-off. A commonly used paradigm, multi-armed bandit, has shown humans to exhibit an "uncertainty bonus", which combines with estimated reward to drive exploration. However, previous studies often modeled belief updating using either a Bayesian model that assumed the reward contingency to remain stationary, or a reinforcement learning model. Separately, we previously showed that human learning in the bandit task is best captured b...
The learned helplessness (LH) paradigm, developed in experimental animals, has had great influence on the development of models of mood and anxiety disorders. However, the insights from this paradigm have not always translated straightforwardly into human experimental work. In particular, instrumental contingency learning experiments yielded the contradictory finding of more accurate contingency knowledge in depressed individuals (“depressive realism”: DR). A growing literature involving the app...
Source
#1William H. Barnett (IUPUI: Indiana University – Purdue University Indianapolis)
#2Alexey Kuznetsov (IUPUI: Indiana University – Purdue University Indianapolis)H-Index: 12
Last. Christopher C. Lapish (IUPUI: Indiana University – Purdue University Indianapolis)H-Index: 17
view all 3 authors...
Pathology in neural circuits that control the expression of goal-directed and habitual behaviors is hypothesized to be a major contributing factor to addiction. In this study, we investigate cortico-striatal circuitry involved in learning and how cortical interactions with specific striatal subregions are involved in the emergence of inflexible behaviors such as compulsive drinking. Specifically, we develop a computational model of cortico-striatal interactions that performs concurrent goal dire...
Source
Subjective experience is a powerful contributor to value-based decision-making. Not every decision is the same, nor made in isolation. Rather, decision-making relies on historical information and internal states for adaptive control. Hence, it is inherently continuous with respect to time - one decision or action evolves into the next. However, forays into the neurobiological underpinnings of decision-making have too frequently ignored the contribution of such continuous subjective experience, i...
2 CitationsSource
We consider the two-sided matching market with bandit learners. In the standard matching problem, users and providers are matched to ensure incentive compatibility via the notion of stability. However, contrary to the core assumption of the matching problem, users and providers do not know their true preferences a priori and must learn them. To address this assumption, recent works propose to blend the matching and multi-armed bandit problems. They establish that it is possible to assign matchin...
3 Citations