Methodology note · June 2026

Same model, different searcher.

When an AI assistant answers a shopping question, how much of the answer is the model, and how much is the search engine feeding it? We measured it: identical prompts, the same models, two different retrieval routes, called seconds apart.

The design. Twenty frozen streaming prompts from our monthly study, run through each model twice in paired calls: once through the single shared retrieval layer our research uses (the Perplexity Agent API), and once through each vendor's own API with its native web search (OpenAI's search tool, Anthropic's web search, Google Search grounding). Same prompt, same minute, so day-to-day drift can't masquerade as route divergence.

Finding one: the recommendation barely moves. The set of brands each model named overlapped 0.83 to 0.92 (Jaccard) across routes. Who AI recommends is model behavior. It survives a complete change of search engine.

Finding two: the sources almost completely change. The cited domains overlapped just 0.09 to 0.27 across routes. The same model, asked the same question, reads an almost entirely different web depending on who does the searching. Even grounding behavior is route-dependent: ChatGPT searched the live web twice as often on its native route; Claude searched less.

Engine	Brand overlap	Source-domain overlap	Grounded (shared)	Grounded (native)	YouTube / Reddit share, native
ChatGPT	0.92	0.09	25%	55%	0% / 0%
Claude	0.83	0.27	95%	60%	1% / 0%
Gemini	0.83	0.15	95%	100%	6% / 6%

Overlap is Jaccard similarity between the two routes across the run (1.0 = identical sets). *Gemini's native leg ran on the stable flash model; its shared-route leg uses the preview flash model, so its row carries a model-version caveat the others don't. Grok's native search API was mid-migration at measurement time and is excluded. 20 prompts, one category, one day: directional, not definitive.

The platform nuance. Where the routes disagree most is social and creator content. The shared retrieval layer cites YouTube and Reddit on 5 to 12 percent of citations. Native Google grounding also cites both, at about 6 percent each. Native OpenAI and Anthropic search cite essentially neither. If your category's AI answers are shaped by creator videos and community threads, which engine a buyer asks, and what searches behind it, decides whether that content reaches them at all.

Why our research runs on one shared layer. Our monthly studies hold retrieval constant across all four models on purpose: it is the only way a cross-model comparison isolates the models themselves. This measurement is the receipt for that design choice, and for its boundary. Recommendation findings in our reports are model-attributable and route-robust. Source-mix findings describe the supply the shared layer feeds the models, and every report says so.

The practical takeaway for a brand is the asymmetry itself: you cannot change what a model believes this quarter, but the sources are an open competition that restarts with every searcher. The same answer is being assembled from different shelves, and the shelves are where you can show up.

The streaming study these prompts came from →All research →

Conversational Commerce Intelligence

mapou.ai

mapou measures how AI assistants alter commercial decisions inside high-intent consumer categories. Methodology notes like this one document the instrument itself.

Measure your brand on this instrument, free →