Key points are not available for this paper at this time.
Visual representations are essential for communicating complex programming concepts and play a critical role in algorithm and code design. While recent advances in Large Multimodal Models (LMMs) have demonstrated impressive visual capabilities across various domains, their visual reasoning abilities in coding contexts remain largely unexplored. In this work, we present a systematic evaluation of LMMs’ visual reasoning abilities in coding contexts. We introduce HumanEval-V , a benchmark comprising 253 human-annotated code generation tasks, each requiring the generation of correct code solutions based on problem contexts encoded in diagrams. To ensure high-quality annotations, the construction of these tasks involves over 800 hours of meticulous human effort. Our tasks span six diverse categories and assess a broad spectrum of visual reasoning skills relevant to real-world programming scenarios. Through evaluation of 27 state-of-the-art LMMs, we find that even top-performing models such as Claude 3.5 Sonnet and Pixtral 124B achieve only 36.8% and 21.3% pass@1, respectively, while many open-weight models perform below 10%. Error analysis reveals that LMMs frequently generate hallucinations by misinterpreting or inventing visual details, and exhibit particular limitations in spatial reasoning, topological understanding, and handling dynamic visual patterns that are straightforward for humans. We finally discuss research opportunities and challenges to enhance model capabilities in visual reasoning for code generation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Fengji Zhang
Linquan Wu
Huiyu BAI
ACM Transactions on Software Engineering and Methodology
Tsinghua University
Zhejiang University
City University of Hong Kong
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Fri,) studied this question.
www.synapsesocial.com/papers/6a095c2c7880e6d24efe234c — DOI: https://doi.org/10.1145/3813804