November 14, 2023Open Access

Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks

Key Points

Key points are not available for this paper at this time.

Abstract

We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark 10, which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. 10 by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4, on zero- and one-shot prompts using image versions of the simplest tasks. Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.

Demander à l'IA

Bookmark

View Full Paper

Cite This Study

Mitchell et al. (Tue,) studied this question.

synapsesocial.com/papers/6a0eeb71aa1655e5fb2300e9 https://doi.org/https://doi.org/10.48550/arxiv.2311.09247

Demander à l'IA

Bookmark

View Full Paper