

Given the strong constraint friction puts on an object’s dynamics, an observer may solve ambiguity in a visual scene by inferring whether frictional forces are being exerted on a moving object. Rolling without slipping results in a combination of rotational and translational motion, where a ball would rotate clockwise during rightward translation and anticlockwise during leftward translation, as seen in Fig. Frictional forces oppose the tendency of a moving ball to slide or skid, resulting in what is commonly defined as rolling without slipping. Here, we focus on how an object’s movement can be disambiguated by combining a prior understanding of classical mechanics with sensory information. Inferred motion trajectories (e.g., as affecting the ability to catch a baseball) have further been found to be influenced by the perception of gravity (McIntyre, Zago, Berthoz, & Lacquaniti, 2001 Monache, Lacquaniti, & Bosco, 2019). More sophisticated inferences include generative models, used to predict the behavior of physical objects based on the everyday experience with Newtonian laws of motion (Ullman, Spelke, Battaglia, & Tenenbaum, 2017). Prior assumptions of shape can drive color and brightness perception (Bloj & Hurlbert, 2002 Bloj, Kersten, & Hurlbert, 1999). A well-known example of this is the light-from-above prior in our interpretation of shape-from-shading: By assuming that light comes from above, we can infer whether shadows are cast by a hollow or a bump (e.g., Adams, 2008).

This means that we often need to combine the visual input with prior assumptions about the physical world to disambiguate scenes. Our rich natural visual world contains too much information to constantly and uniformly sample at a high resolution. Consequently, we know little of the role of gaze in analyzing this contextual information. However, most of what we know about vision is about our perception of objects isolated from their physical environment. Gaze movements allow the sampling of information about the object–ground relationship by foveating objects under scrutiny, thereby increasing the spatial resolution of the visual input. The interaction of objects with ground surfaces provides valuable information for predicting and interpreting their motion (Gibson, 1950).
