What question did this study set out to answer?

The aim is to improve machine reasoning in vision tasks through fast and slow thinking strategies, addressing limitations of data availability.

June 26, 2026Open Access

Reasoning in machine vision by learning fast and slow thinking

Key Points

The aim is to improve machine reasoning in vision tasks through fast and slow thinking strategies, addressing limitations of data availability.
Developed a machine reasoning paradigm integrating fast and slow thinking modules inspired by dual-process theories.
Implemented a System I module for quick solution generation and verification, combined with a System II module for iterative refinement using self-play reinforcement learning.
Conducted performance evaluations on computer vision benchmarks and cancer localization tasks across five organs.
Extended inference-time compute improved performance over traditional supervised learning approaches with large datasets (exact metrics not specified).
Achieved superior outcomes compared to existing foundation models and expert human performance in vision tasks.
Highlighted significant potential for machine reasoning in data-scarce situations.

Abstract

Reasoning is a hallmark of human intelligence, enabling adaptive decision-making in complex unfamiliar scenarios. In contrast, machine intelligence remains bound to training data, unable to dynamically refine solutions at inference. While recent advances have explored machine reasoning - trading inference-time compute for improved performance - they focus on verbal domains such as mathematical problem-solving where explicit rules govern step-by-step solution generation. Many tasks lack sufficient labelled data and require alternative performance improvement mechanisms, such as inference-time compute. Here we present a paradigm for machine reasoning in vision, enabling performance improvements with increasing thinking time (inference-time compute), even with limited labelled data. Our approach is inspired by dual-process theories of human cognition, integrating a fast-thinking System I module for generating and verifying solutions in familiar tasks, with a slow-thinking System II module that iteratively refines predictions using self-play reinforcement learning, even when task-specific data is limited. This paradigm involves proposing, competing over, and refining solutions until convergence. We demonstrate that extended inference-time compute yields superior performance compared to large-scale supervised learning, foundation models, and human experts in vision tasks. These include computer-vision benchmarks and cancer localisation across five organs, highlighting the potential of inference-time compute for data-scarce problems.

Bookmark

View Full Paper

Cite This Study

Saeed et al. (Tue,) studied this question.

synapsesocial.com/papers/6a3e1670030ad1a9b3090485 https://doi.org/https://doi.org/10.1038/s41467-026-74579-8

Bookmark

View Full Paper