About
First few hundred milliseconds: Reverse-engineering the brain’s algorithms for core vision
The visual system must not only recognize and localize objects, but also perform much richer inferences about the causes in the world underlying sense data. This research thrust aims to uncover the algorithmic basis of how we see so much so quickly, aiming to capture, in concrete engineering terms, something that we often take for granted: the breathtakingly fast and complex set of computations that occur in the brain between the moment we open our eyes and the moment a perceptual experience of a rich world appears in our minds.
Understanding human attention: Adaptive computation & goal-conditioned world models
Most scenes we encounter hold complex structure (e.g., in terms of objects, agents, events, places), but our goals render only a slice of this complexity relevant for perception. Why do we see what we see? To answer this, we have been focusing on a new account of attention. Attention is central to human cognition, with decades of research since the cognitive revolution exploring how attention continually focuses visual processing in the service of our goals. But, how does this work in computational terms? Our goal here is to uncover the computational underpinning of how attention integrates goals to construct goal-conditioned, structured representations of the world. Addressing this goal has been opening up a new generation of formal models that generate testable predictions at unprecedented empirical depth across the domains of scene perception, intuitive physics, and planning.
Uncovering neural mechanisms of world models by symbolically programming RNNs
How is it that through the distributed and dynamic activity in our brain’s neural circuits, we think thoughts about objects, mentally simulate how they will move and react to forces, and plan actions toward them? This research thrust develops new multilevel modeling frameworks that, uniquely, inter-operate in both cognitive hypotheses (e.g., physical object representations) and neural mechanisms (e.g., distributed codes and attractors). This line of work recently led to evidence for a “mental simulation circuit” in the prefrontal populations of macaques playing the video game pong.
Intuitive physics basis of perception
Many of the objects we encounter in everyday life are soft — from the shirt on your back, to the towel on your counter. Most existing computational and neural studies of object perception, however, have focused only on rigid objects — such as blocks or tools. This is an important limitation, since only soft objects can change their shape dramatically — e.g., when you toss your shirt aside, or fold your towel. To address this, this research thrust explores the ability of different kinds of models to capture human perception of cloths and liquids. We find that standard models of vision, including performant DNNs, fail to explain human performance. Instead, we find that capturing human performance requires a different kind of model that integrates intuitive physics, realized as probabilistic simulations of how soft objects move. We find converging evidence for this conclusion in a recent fMRI study, together revealing an account of soft object perception that contrasts sharply with the currently popular approaches.
The interface between seeing and remembering
The spontaneous processing of visual information plays a significant role in shaping memory, sometimes even overshadowing voluntary efforts to encode specific details. What are the neurocomputational mechanisms that underlie the transformation of percepts to memories in the brain? This research addresses this question using computational models, behavioral experiments, and data analysis. We posit that the interface of perception and memory is modulated by an adaptive mechanism called depth-of-processing. This depth of processing is modulated, on-the-fly and image-by-image, by a simple computational signature: compression-based reconstruction error of an image. We find that images with harder to reconstruction visuals representations leave stronger memory traces; we also find the same computational signature of depth-of-processing predicts activity in the single human hippocampus and amygdala neurons.