What question did this study set out to answer?

The study aims to identify which prompt architecture components improve reasoning quality in language models.

February 27, 2026Open Access

Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem

Key Points

The study aims to identify which prompt architecture components improve reasoning quality in language models.
Conducted a variable isolation study with 120 trials across 6 conditions.
Utilized Claude 3.5 Sonnet with controlled hyperparameters.
Applied STAR reasoning framework to different prompting conditions.
Assessed the effect of user profile context and RAG context on reasoning accuracy.
The STAR framework increased accuracy from 0% to 85%.
User profile context added a further 10 percentage points gain.
RAG context contributed an additional 5 percentage points.
Achieved 100% accuracy in the full-stack condition.

Abstract

Large language models consistently fail the "car wash problem, " a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per condition, 6 conditions, 120 total trials) examining which prompt architecture layers in a production system enable correct reasoning. Using Claude 3. 5 Sonnet with controlled hyperparameters (temperature 0. 7, topₚ 1. 0), we find that the STAR (Situation-Task-Action-Result) reasoning framework alone raises accuracy from 0% to 85% (p=0. 001, Fisher's exact test, odds ratio 13. 22). Adding user profile context via vector database retrieval provides a further 10 percentage point gain, while RAG context contributes an additional 5 percentage points, achieving 100% accuracy in the full-stack condition. These results suggest that structured reasoning scaffolds -- specifically, forced goal articulation before inference -- matter substantially more than context injection for implicit constraint reasoning tasks.

Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem

Key Points

Abstract

Cite This Study

Also Consider

Also Consider