Uncovering 'world models': from cognition to neurobiology

When we open our eyes, we do not see a jumble of light or colorful patterns. There lies a great distance from the raw inputs sensed at our retinas to what we experience as the contents of our perception. The goal of our research program is to understand how our brains transform raw sense inputs into rich, discrete structures that we can think about, plan with, and act on --- objects with 3D shapes and physical properties, scenes with navigable surfaces, and events with temporally demarcated dynamics. We refer to such structure-preserving, behaviorally efficacious representations of reality as 'world models', and consider them as the glue of intelligence, linking what we perceive to how we plan and decide. Despite their centrality, much remains unknown about world models, from cognition to neurobiology. What are the representational formats underlying world models? How are they selectively deployed during perception?  How are these representations implemented in neural populations and circuits, and how are they inferred across the sensory cortex? 

To pursue these questions, we take a primarly computational approach for a distinctly integrative, multi-level program spanning the cognitive and neural levels. We develop computational theories that synthesize an especially broad technical toolkit including probabilistic programming, causal generative models, nonlinear dynamics and control, and approximate Bayesian inference (using deep learning, sequential importance samplers, and optimization-based methods). We test these models in objective, performance-based psychophysical experiments in humans, and in neural data from human and non-human primate experiments via experimental collaborators (and increasingly in neural experiments we design and execute). With this multi-level program and multifaceted methodology, we aim to reveal how mental life --- the contents of our percepts and thoughts --- is implemented in neurobiology, and how it can be realized in artificially intelligent systems.

About

First few hundred milliseconds: Reverse-engineering the brain’s algorithms for core vision

The visual system must not only recognize and localize objects, but also perform much richer inferences about the causes in the world underlying sense data. This research thrust aims to uncover the algorithmic basis of how we see so much so quickly, aiming to capture, in concrete engineering terms, something that we often take for granted: the breathtakingly fast and complex set of computations that occur in the brain between the moment we open our eyes and the moment a perceptual experience of a rich world appears in our minds. 

Understanding human attention: Adaptive computation & goal-conditioned world models

Most scenes we encounter hold complex structure (e.g., in terms of objects, agents, events, places), but our goals render only a slice of this complexity relevant for perception. Attention somehow allows the mind to represent those objects in the world that are most relevant for our ongoing planning and actions. Whereas a rich tradition in experimental psychology has conceptualized attention in terms of objects and other structured mental representations, computational modeling work has almost uniformly formalized attention as a selective process determining what we sense, implemented as the weighting of feature embeddings with bottom-up (e.g., salience) or top-down factors (e.g., task templates). This research thrust aims to uncover the computational underpinning of how attention integrates goals to construct multi-granular, goal-conditioned world models. 

Hypothesis-driven inquiry of how world models are implemented in neural populations

A central and shared challenge for cognitive science and neuroscience is to understand how in the brain the physical world is represented. Yet, the cognitive scientist and the neuroscientist do not necessarily speak the same language: a cognitive scientist might come to this challenge thinking about simulatable object representations and planning with them; the neuroscientist might come to it thinking about attractors and population dynamics. This research thrust is about developing new multi-level toolkits to enable faster and more significant discovery as to how the knowledge of the physical world might be implemented in the brain. 

Intuitive physics basis of perception

Scene understanding is not just about recognizing what is where, but also seeing the physics of a scene.  We can tell how heavy or elastic an object is by its feel when we hold it, but also by watching it move -- seeing it collide with other objects, bounce off the floor or splash into water. Even the shape of an object, a canonical aspect of core vision, is sometimes subject to the physics of a scene: The shape of a soft object is governed by its intrinsic dynamics and the external forces that apply on it. Similar issues arise for the perception of liquids, including its flow and viscosity. This research thrust develops novel computational models and objective, performance-based behavioral paradigms to study the "intuitive physics basis" of perception. We also test aspects of these models in fMRI experiments using animations of the world of physical objects.