Toddlers learn to recognize objects from different viewpoints with almost no supervision. During this learning, they execute frequent eye and head movements that shape their visual experience. It is presently unclear if and how these behaviors contribute to toddlers' emerging object recognition abilities. To answer this question, we here combine head-mounted eye tracking during dyadic play with unsupervised machine learning. We approximate toddlers' central visual field experience by cropping image regions from a head-mounted camera centered on the current gaze location estimated via eye tracking. This visual stream feeds a neural network model, which uses a biologically plausible unsupervised learning objective. Our experiments demonstrate that a few minutes of such first-person experience suffice to learn strong object representations permitting invariant object recognition. Importantly, by simulating alternative gaze behaviors we show that toddlers' eye movement patterns play a crucial role in this. Our analysis also reveals that the limited size of the central visual field where visual acuity is high plays an important role for successful learning. Together, this highlights the benefits of temporally structured visual experience arising from toddlers' natural interactions with objects. SUMMARY: We combine recordings of toddlers' first-person central visual field experience with biologically inspired self-supervised learning algorithms to model toddlers' development of invariant object recognition. Just a few minutes of toddlers' central visual field experience captured with head-mounted eye tracking suffice to learn strong object representations. Simulated alternative gaze behaviors produce weaker representations, demonstrating the importance of toddlers' active gaze strategies for learning. Our results emphasize the importance of toddlers' eye movements for learning object representations.
Yu et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: