What Research Do We Do?

From a quick glance, the touch of an object, or a brief sound snippet, our brains construct scene representations composed of rich and detailed shapes and surfaces. These representations are not only the targets of perception, but also support aspects of cognition including reasoning about physics of objects, planning actions, and manipulating objects -- as in the paradigmatic case of using or making tools. How is it that perception transforms raw sensory signals arising from our physical environments, into things like objects and people, into things that we can think about? This is the key question that drives the research in the lab. We approach this goal primarily with computational modeling that brings together a diverse range of technical methods including probabilistic models, simulation engines (including graphics and physics engines), and efficient approximate Bayesian inference (using deep neural networks, sequential importance samplers, approximate Bayesian computation, and their hybrids). We test these models empirically in behavioral and neural experiments to build a unified account of neural function, cognitive processes, and behavior, all in precise engineering terms.


Physical object representations in the mind and brain.

We perceive rich and detailed three-dimensional (3D) shapes and surfaces, substance properties of objects (such as whether they are light or heavy, rigid or soft, solid or liquid), and relations between objects (such as which objects support, contain or are attached to other objects). These physical targets of perception support flexible and complex action, as the substrate of planning, reasoning, and problem solving. Despite their fundamental role in perception, many important questions about object representations remain open. What kind of information formats or data structures underlie them, so as to support the many ways in which humans flexibly and creatively interact with the world? How can properties of objects be inferred from sensory inputs, and how are they represented in neural circuits? A current line of research in the lab is to develop and test a theory of physical object representations (the POR theory) to answer these questions.

Efficient analysis-by-synthesis to understand cortical computations in the ventral visual cortex.

Analyzing scenes by inverting causal generative models, also known as "analysis-by-synthesis", has a long history in computational vision, and these models have some behavioral support, but they are typically too slow to support online perception and have no known mapping to actual neural circuits. In general, how we see so much so quickly, how our brains compute rich descriptions of scenes with detailed 3D shapes and surface appearances, in a few hundred milliseconds or less, is a key challenge for all existing approaches. We tackle this challenge by building inference networks based on deep neural networks that approximately inverts generative models, stage by stage, based on their conditional independence structure. This approach explains multiple levels of neural processing in non-human primates, in one key domain of high-level vision, perception of faces. Presently, we are developing and testing this approach more broadly.

Planning actions and reasoning about others' goals.

We (and indeed many other species as well) are all about interacting with and manipulating our environments -- reaching, grasping, pushing, pulling, picking up, stacking, balancing, cutting, throwing, or sitting on objects. The morphology and mechanics of an agent, its goals, and the subtle details of object geometry and substance properties are the critical factors of flexible, complex manipulation. How do we learn about the morphology and mechanics of other agents? How do we plan in complex object manipulation tasks (e.g., stacking a spread of blocks into a target configuration)? And how do we know what other agents want when see them act on objects? These and similar questions constitute some of the recent research directions we are tackling using computational modeling and behavioral experiments. 

Multisensory perception and cross modal transfer.

Humans can perceive shape and other attributes of objects (e.g., their substances and masses or their identities) by seeing them, but also by touching them or by hearing them. Moreover, we can imagine how a seen object would feel and how a felt object would look. The POR theory (see above) provides a principled approach to understanding the representations and computations that underlie how human crossmodal and multisensory perception works. We pursue this line of work by building computational models using tools from robotics (e.g., grasping engines, haptic sensors) and in visual-haptic experiments by making use of 3D printing technology.