Anthropic's researchers just published evidence that AI failures become increasingly incoherent as reasoning gets longer. We ran the same analysis on 7,000+ purchase recommendation sequences. The findings should change how every brand thinks about AI measurement. AIVO Optimize tracks brand recommendations across four-turn buying sequences — the progression from initial awareness query through to a direct purchase recommendation. We run these sequences repeatedly on identical prompts, across ChatGPT, Perplexity, Gemini, and Claude, and record whether a brand wins or loses the recommendation at each turn. Anthropic's paper is about the fundamental behaviour of large language models under extended reasoning. Ours is about what a consumer sees when they ask AI what moisturiser to buy. The level of abstraction is different. The underlying phenomenon is the same. The question it forces for anyone measuring brand performance in AI — including us — is whether your measurement is capturing a stable signal or averaging over noise and calling it insight. At turn four of a purchase sequence, for most brands in most categories, the honest answer is: mostly noise. That's not a comfortable finding. It's a useful one. You can't fix a problem you haven't measured.
Building similarity graph...
Analyzing shared references across papers
Loading...
AIVO Optimize
Building similarity graph...
Analyzing shared references across papers
Loading...
AIVO Optimize (Tue,) studied this question.
synapsesocial.com/papers/69cf5f645a333a821460e8d0 — DOI: https://doi.org/10.5281/zenodo.19347910