Do the major AI engines (ChatGPT, Perplexity, Gemini, Claude, Grok) agree about which brands are best?

No. The mapou Visibility Index measures engine agreement at 0.55 across 36 segments (Spearman rank correlation, where 1 is perfect agreement and 0 is independent). The engines agree about half the time on average, and in some categories (home services, luxury automakers) they're nearly statistically independent. AI search is not one channel; it's five surfaces with their own dynamics.

Which categories are winner-take-all in AI search, and which are open?

Looking at how few brands AI effectively recommends out of the tracked set per segment: home improvement retailers, luxury goods, toys, and credit cards are the most concentrated, with AI effectively recommending only 3-4 brands out of 10-20 tracked. Categories with effectively 6+ recommended brands are wide open and a challenger brand can earn AI visibility on the merits.

What is a demand-leak brand in AI search?

A demand-leak brand is one cited frequently in Discovery prompts ('best brands in X') but not in Evaluation prompts ('which X should I buy'). They have awareness but not recommendation in AI answers. The fix is usually evaluation-phase content: third-party reviews, comparison tables, and attribute-rich pages that surface in decision queries.

What is the difference between Generative Engine Optimization (GEO) and SEO?

SEO optimizes for the ten blue links that human users click. GEO (Generative Engine Optimization) and AEO (Answer Engine Optimization) optimize for the answers AI engines generate when shoppers ask buyer-intent questions. The mapou Visibility Index measures GEO/AEO outcomes directly: whether ChatGPT, Perplexity, Gemini, Claude, and Grok cite a brand as a recommendation, mention it as context, or never surface it.

How is the mapou Visibility Index methodology different from a one-off AI ranking?

MVI uses a fixed, pre-registered prompt set (20 prompts per brand), runs it monthly across all five major AI engines (740 brands across 36 segments as of May 2026), and reports a confidence interval on every score so reviewers can see whether two brands' scores are statistically distinct. Snapshots are date-stamped so paired before/after comparisons are possible. Methodology v1.0.

What is a kingmaker engine in AI search?

A kingmaker engine is the AI engine with the largest gap between its most-cited and least-cited brand at a given buyer-intent phase. It is where small differences in brand strength produce the largest differences in AI visibility. The mapou Visibility Index reports a kingmaker per phase (Discovery, Filtered Discovery, Comparison, Evaluation) per segment, so a brand can prioritize the engine that controls the funnel stage it cares about.

Should marketers treat AI search as one channel or as five separate channels?

As five separate surfaces. The mapou Visibility Index measures engine agreement at 0.55 across 36 segments and 5 engines. ChatGPT's top-recommended brands are not Gemini's top-recommended brands. A consolidated "AI search" budget without engine-specific content and measurement misses how the channel actually works.

How many brands are tracked in the mapou Visibility Index?

740 brands across 36 competitive segments as of May 2026. Each brand is measured against a fixed 20-prompt buyer-intent battery on ChatGPT, Perplexity, Gemini, Claude, and Grok. The full list and per-segment data are open at https://mapou.ai/research.

mapou Visibility Index · State of AI Search

AI search is not one channel. It's five, and they disagree.

Eleven findings about how AI search actually works in 2026, across 36 segments, 740 brands, 5 AI engines (ChatGPT, Perplexity, Gemini, Claude, Grok), and 12 buyer personas. The mapou Visibility Index, updated monthly from a pre-registered prompt set with public predictions and on-the-record statistical caveats. Full methodology.

Throughout this page: MVI (mapou Visibility Index) = a 0–100 score per brand combining citation rate across all five engines and 20 fixed buyer-intent prompts, phase-weighted (0 = AI never cites the brand, 100 = AI cites it in every prompt). Discovery = “best brands in X” prompts. Filtered Discovery = “best brands in X” with a constraint (price, region, use-case, beginner, professional). Evaluation= “which X should I buy” and head-to-head comparison prompts. Engine agreement = how often the five engines rank brands the same way (0 = independent surfaces, 1 = identical). Effective brands = how many brands AI is really recommending in a segment, regardless of how many are tracked. pp = percentage points.

Use cases for this report

Who this is for and how to use it.

Brand teams

(CMOs, growth, ecommerce)

Benchmark your AI search visibility on the engines and personas your buyers actually use.

Find your category in All segments and read your MVI tier.
Use Findings 06, 08, 11 to identify whether your tier is engine, persona, or framing-conditioned.
Run a free check on your brand at /check.

Retail media networks

(retailers, RMN leaders)

Understand how AI engines decide which retailer they cite, and which categories AI is most influential in.

Use Finding 03 to identify funnel phases where AI most affects category demand.
Use the integration tracker to measure the lift of retailer × engine deals.
Discuss persona-tuned MVI for stocked-brand visibility benchmarks.

Agencies and consultancies

(GEO, AEO, performance, research)

Bring quantified, reproducible AI search benchmarks into client recommendations and pitch decks.

Cite individual findings with the published method captions.
Reference the falsifiable predictions to demonstrate rigor.
Talk to mapou about the agency program for client-ready MVI reports.

Executive takeaways

The 30-second skim.

For: CMOs, heads of ecommerce, retail media leads, AI-search/GEO practitioners, analysts covering AI-mediated discovery. Methodology snapshot below the takeaways; full framework on /methodology.

The single-number AI visibility myth is dead.

One brand. One MVI. One leaderboard. That framing reads cleanly in a board deck and falls apart inside the data. Across 36 segments and 740 brands, the same brand has different visibility on every engine, drops between funnel phases, and shifts under buyer persona. Eleven separate measurements per brand, not one.

Buyer signals move the leaderboard.

In 32 of 36 categories, the #1 brand changes when AI receives a buyer signal it did not have on the baseline run. The persona-tuned MVI for a brand's actual buyer mix, not the average buyer, is the only useful visibility metric in categories where rankings move with persona.

Most brands are invisible to most engines.

56% of (brand × prompt) cells in the panel are invisible to all 5 engines simultaneously. The ceiling for "improvement" is much higher than most brand teams assume because the floor is much lower than they think.

Five engines, five answers, five investments.

Mean engine agreement is 55%, well below the threshold where optimizing for one engine carries to the others. The right engine to optimize for first depends on which buyer phase your category leans on, and that varies by category.

Frame conditioning matters.

Removing the shopping-assistant system prompt flips the #1 brand in 8 of 22 segments. An "AI search visibility" number is always frame-conditioned. Vendor benchmarks that publish different leaders are usually using a different frame.

Six stats from the panel

55%

mean cross-engine agreement

Finding 01

56%

of (brand × prompt) cells invisible to all 5 engines

Finding 07

32 of 36

categories where the leader brand flips under at least one buyer persona

Finding 08

8pp

spread between most-stable engine (Claude, 74%) and most-sensitive (Gemini, 66%) under personas

Finding 09

69%

of tracked brands are not yet cited (MVI < 25)

all-segments

active falsifiable predictions on record, graded next month

Predictions

Eleven findings at a glance

01Engines disagree.Mean cross-engine agreement on rankings is below the threshold where one-engine optimization is sufficient.02Concentration is winner-takes-most.Effective brand counts are far below tracked-brand counts in most categories.03Demand leaks Discovery → Evaluation.Top brands lose visibility as buyers move down the funnel.04Mention-but-not-cited gap.Brands AI knows about that AI does not recommend. Positioning problem, not visibility problem.05Different engines decide different funnel stages.Kingmaker engines per phase. The right engine to optimize for depends on the prompt.06Brand-level cross-engine spread.Two brands at the same MVI can have wildly different per-engine visibility.07Most brands are invisible to most engines.56% of (brand × prompt) cells are invisible to all 5 engines simultaneously.08Personalization shifts who AI recommends.Leaders flip in 32 of 36 segments under at least one buyer persona.09Engines personalize differently.Same buyer signal produces different stability across the 5 engines. 8-point spread between most stable and most sensitive.10Persona signals shift different funnel phases differently.Aspirational personas (premium, values-driven) move Evaluation rankings hardest. Practical personas (professional, first-time) preserve them.11The shopping frame is not neutral.Switching the system prompt from shopping-assistant to minimal flips the #1 brand in 8 of 22 segments (36%) and drops cite rates by 7.8 percentage points.

What eMarketer-style reports don't typically include and this one does: a 12-persona × 5-engine personalization layer (Findings 08-09), bootstrap 95% CIs on the cross-engine sensitivity table, and five falsifiable predictions on record that get graded next month.

About the panel

Objective

Measure which brands AI search engines cite by default when buyers ask buying questions, across the categories where AI-mediated discovery is reshaping consideration. Same prompts, same panel, refreshed monthly so visibility shifts are auditable rather than anecdotal.

Panel

740 brands across 36 competitive segments grouped under 9 parent industries. Each segment runs its own canonical 20-prompt set against 5 AI engines (ChatGPT, Perplexity, Gemini, Claude, Grok) and 12 buyer personas (baseline plus 11 named persona signals). Tracked brands picked for category coverage, not for editorial preference.

Topics

Per-brand MVI (visibility tier), cross-engine agreement, funnel-phase strength (Discovery / Filtered / Comparison / Evaluation), persona robustness, citation concentration (effective brand count), demand leakage between phases, mention-but-not-cited gaps, and frame-conditioned ranking shifts.

Fielding

Monthly, US English only. Runs are dated and immutable; per-segment archives are committed to a versioned data store. This report references the May 2026 run. Paid customer access to the full panel (12 personas × 5 engines × per-prompt verdicts) lives in Postgres via app.mapou.ai. Methodology version 1.0.

Segments

Tracked brands

740

Engines

Personas

Verdicts / month

74,000

CI policy

Wilson 95%

Note. Verdicts/month is the canonical MVI run only (brand × prompt × engine × 1 baseline persona). Persona-panel runs add additional verdicts per buyer persona tested. Author: Arvin Nundloll (formerly Comcast Advertising, NBCUniversal, Amazon, DIRECTV). Full framework, statistical caveats, and limitations on /methodology.

Metric glossary (8 terms)

MVI (mapou Visibility Index): 0-100 score per brand combining citation rate across 5 engines and 4 funnel phases. Phase-weighted, with Wilson 95% CIs.
Effective brands: Inverse Simpson index. The number of brands AI effectively recommends in a category, accounting for citation concentration. Lower = more winner-takes-most.
Engine agreement (mean Spearman ρ): How similarly the 5 engines rank brands within a category. ρ=1 means perfect agreement, ρ=0 means independent rankings.
Cross-engine spread: The maximum gap in citation rate for one brand across the 5 engines. Identifies brands with engine-specific visibility risk.
Demand leakage: Drop in citation rate from Discovery prompts (top of funnel) to Evaluation prompts (bottom of funnel) for the same brand.
Mention-but-not-cited: Brand named in the answer but not actively recommended. Counted separately because mentions don't drive consideration the way recommendations do.
Kingmaker engine: The engine most predictive of overall MVI for a given funnel phase. Tells you which engine to optimize for first if budget is constrained.
Top-3 overlap (persona robustness): Fraction of baseline top-3 brands that stay in top-3 under each non-baseline persona. Used to measure how much rankings shift under buyer signals.

For CMOs

Five decisions that come out of this report.

The findings below are research. Here is what to do with them. Each bullet is a portfolio-level decision, with the mapou service that operationalizes it. Numbers reflect the 36-segment monthly panel, refreshed on the 1st of every month.

Treat AI engines as separate channels

Engines disagree on baseline #1 in 6 of 36 categories. Single-engine optimization is a single-channel investment. Plan budget by engine, not by "AI."

How mapou helps: Start with an AI Visibility Audit to map your position per engine.

Visibility is buyer-mix dependent

Leaders flipped in 32 of 36 segments under at least one buyer persona. Your baseline MVI is a calibration. Build the persona-corrected number into the board deck.

How mapou helps: Persona-Tuned MVI surfaces the visibility number that matches your actual buyer mix.

Personalization is real and measurable

AI search is not winner-takes-most in the abstract. It is winner-takes-most per engine, per persona. The brands invisible in baseline can dominate their persona, and the leader you see today depends on who is asking.

How mapou helps: GEO & Citation Architecture restructures your entity data so AI surfaces you in the personas that fit your positioning.

Visibility tier is the conversation

MVI 75+ is default-choice. 50-74 is repeat-use. 25-49 is first-encounter. <25 is not yet cited. The right strategy differs at each tier. A first-encounter brand needs visibility infrastructure. A default-choice brand needs to defend against persona-driven challengers.

How mapou helps: AI Visibility Audit identifies your tier; GEO & Citation Architecture moves you up.

Time-series wins

A monthly tracked MVI catches changes a one-shot audit misses. AI search retrieval policies, model upgrades, and competitor activity all move the leaderboard between snapshots. Static GEO is yesterday's SEO.

How mapou helps: Executive Advisory keeps your monthly MVI tracked, with a CMO-level read on what changed and why.

Finding 01

The five engines agree less than half the time.

Across 36 segments, average engine agreement is 0.55. The five engines rank brands the same way only about half the time. Marketers planning AI visibility as one channel are wrong by construction.

Most-divergent segments (engines disagree the most about who's best)

Investing & wealthdivergence 0.88 · engine agreement 0.12
Home servicesdivergence 0.70 · engine agreement 0.30
Fintech & neobanksdivergence 0.69 · engine agreement 0.31
Toysdivergence 0.67 · engine agreement 0.33
Smart home & connected devicesdivergence 0.66 · engine agreement 0.34

Most-consensus segments (the engines mostly agree)

Mass-market OEMsdivergence 0.15 · engine agreement 0.85
Health insurancedivergence 0.18 · engine agreement 0.82
Luxury & EV OEMsdivergence 0.20 · engine agreement 0.80

What this means in practice: a brand that wins ChatGPT may be invisible in Perplexity or Gemini for the same query set. Your AI channel plan needs to treat the engines as distinct surfaces, with engine-by-engine spend and content strategies, not one consolidated “AI search” bucket.

Method: rank every brand within each engine by citation rate, then compute the mean pairwise rank correlation (Spearman) across all engine pairs. Engine agreement = mean correlation; divergence = 1 − agreement. Near 0 agreement means the engines are effectively independent channels.

Finding 02

Some categories are winner-take-all on AI. Others are wide open.

In a winner-take-all segment AI effectively recommends only 3 of 12 tracked brands; in an open one it spreads citations across most of the field. Same MVI shares, two very different competitive worlds.

Most-concentrated segments (winner-take-all)

Home improvement retailers5.4 effective brands of 20 tracked · top 2 take 54%
TVs5.5 effective brands of 20 tracked · top 2 take 46%
Audio (headphones, earbuds, speakers)6.0 effective brands of 20 tracked · top 2 take 44%
Credit cards6.3 effective brands of 20 tracked · top 2 take 42%
Athleisure & active lifestyle6.6 effective brands of 20 tracked · top 2 take 43%

Most-spread segments (visibility is widely distributed)

Home services14.0 effective brands of 20 tracked · top 2 take 20%
Paint, fixtures & finishes13.5 effective brands of 20 tracked · top 2 take 26%
Luxury goods13.3 effective brands of 40 tracked · top 2 take 26%

What this means in practice: in a concentrated segment, your AI strategy is about dislodging incumbents, and most paid spend gets eaten by the top two. In an open segment, the work is showing up at all and the upside is uncapped because no one has won.

Method: effective number of brands = 1 / Σ(MVI share²), the inverse Simpson index. Self-normalizing for sample size, so a 10-brand segment and a 20-brand one are directly comparable. Equivalent to 10,000 / HHI on a 0-1 share basis.

Finding 03

The biggest names get cited. Then they lose at decision time.

For each brand, the gap between citation rate in Discovery prompts (top-of-funnel: best brands in X) and Evaluation prompts (decision: which one should I buy). Positive gap = brand is awareness-rich and conversion-poor in AI answers.

Top demand-leak brands

Four Seasons · hotelsDiscovery citation rate beats Evaluation by 48 percentage points
loanDepot · mortgagesDiscovery citation rate beats Evaluation by 45 percentage points
Vrbo · online travel agenciesDiscovery citation rate beats Evaluation by 42 percentage points
Vuori · athleisureDiscovery citation rate beats Evaluation by 40 percentage points
Corteiz · streetwearDiscovery citation rate beats Evaluation by 40 percentage points
Kaiser Permanente · health insurersDiscovery citation rate beats Evaluation by 39 percentage points
New York Life · life insurersDiscovery citation rate beats Evaluation by 38 percentage points
Andersen Windows · paint and fixturesDiscovery citation rate beats Evaluation by 37 percentage points

What this means in practice: shoppers can name the brand, but AI doesn't recommend buying it. The revenue at risk is the considered-purchase basket, where AI now sits between awareness and checkout. The fix is usually evaluation-phase content: third-party reviews, comparison tables, and attribute-rich pages that surface in “which X should I buy” queries.

Finding 04

AI knows them. AI doesn't recommend them.

Brands frequently named as context (“... unlike X, this product offers...”) but rarely surfaced as the answer. Recognition without recommendation.

Top mention-but-not-cited brands

Cars & Bids · used car retailersmentioned in 13 · cited in 0 · 100% mention-only
Samsung Galaxy Book · laptopsmentioned in 5 · cited in 0 · 100% mention-only
Independence Blue Cross · health insurersmentioned in 14 · cited in 0 · 100% mention-only
MetLife Auto & Home · P&C insurersmentioned in 6 · cited in 0 · 100% mention-only
At Home · home-improvement retailersmentioned in 7 · cited in 0 · 100% mention-only
Vestiaire Collective · luxury goodsmentioned in 13 · cited in 0 · 100% mention-only
A.P.C. · streetwearmentioned in 7 · cited in 0 · 100% mention-only
Going · online travel agenciesmentioned in 6 · cited in 0 · 100% mention-only

In human terms: AI knows the brand and uses it as a reference point, but rarely gives it the click. Mention without recommendation is half-credit visibility. The fix usually means denser third-party citations, clearer review consensus, and a recommendable proof point, “best for X” framing rather than just “a competitor of Y.”

Method: for each brand, share of total visibility (cited + mentioned) that came from mentions only. High values mean the brand has positioning in AI's conceptual map but isn't earning the recommendation slot.

Finding 05

Different engines decide different stages of the funnel.

Across 36 segments, here's which engine is the kingmaker (largest spread between top and bottom brands) at each buyer-intent phase. Win that engine, win that phase.

Discovery

e.g., "best skincare brands"

ChatGPT25 segments
Claude9 segments
Gemini2 segments

Filtered Discovery

e.g., "best skincare for sensitive skin" / "under $30"

ChatGPT24 segments
Claude9 segments
Perplexity2 segments
Grok1 segment

Comparison

e.g., "CeraVe vs The Ordinary"

ChatGPT23 segments
Gemini4 segments
Grok3 segments
Perplexity3 segments
Claude3 segments

Evaluation

e.g., "which moisturizer should I buy"

ChatGPT27 segments
Claude4 segments
Gemini3 segments
Grok2 segments

Win the kingmaker engine, win the phase. If Claude is the kingmaker for Evaluation in your category, a ChatGPT-only AI strategy will move discovery metrics but lose the conversion moment.

Method: for each segment and funnel phase, we look at each engine's gap between its most-cited and least-cited brand. The engine with the biggest gap is the kingmaker for that phase, it's where small differences in brand strength produce large differences in AI visibility. Counts show how many of the 36 segments each engine wins as kingmaker for that phase.

Finding 06

Same MVI score can hide engine-dependent realities.

19% of tracked brands have a cross-engine citation-rate spread of 50 percentage points or more. The spread is the gap between the brand's best and worst engine. Two brands with identical MVI can be on completely different visibility paths: one durable across engines, one engine-dependent. Averages alone hide this.

Distribution of cross-engine spread per brand

0-10pp

200 (27%)

10-25pp

203 (27%)

25-50pp

194 (26%)

50-75pp

97 (13%)

75pp+

46 (6%)

Engine-dependent (high spread)

Lenovo · laptops

MVI 72, spread 100pp (top Claude, bottom Gemini)

Samsung · smartphones

MVI 76, spread 100pp (top ChatGPT, bottom Perplexity)

LG · TV brands

MVI 76, spread 100pp (top ChatGPT, bottom Perplexity)

Durable across engines (low spread)

Benjamin Moore · paint and fixtures

MVI 84, spread 10pp (consistent across all five)

Louis Vuitton · luxury goods

MVI 73, spread 15pp (consistent across all five)

Northwestern Mutual · life insurers

MVI 56, spread 15pp (consistent across all five)

If your spread is 40pp or higher, target the weakest engine specifically, the lift is concentrated. If your spread is under 15pp, you have a durable position, defend it. Two brands at the same MVI are not the same brand: one is exposed to a single-engine update, the other is not.

Method: per brand, citation-rate spread = max(per-engine citation rate) - min(per-engine citation rate) across the five tested engines. Computed only for brands with at least three engines reporting a citation rate. 740 brands total. 47% have spread under 15pp (durable), 19% have spread 50pp or more (engine-dependent).

Finding 07

Most brands are invisible to most engines.

Across 14,800 (brand, prompt) cells where four or more engines responded, 56% are invisible to all five engines simultaneously, and only 3% are cited or mentioned by all five. Cross-engine consensus invisibility is the rule, not the exception. The recommendation set you actually compete for is much narrower than the brand list suggests.

How many engines miss the brand

0 / 5 (all see)

504 (3%)

1 / 5

1,164 (8%)

2 / 5

1,095 (7%)

3 / 5

1,318 (9%)

4 / 5

2,477 (17%)

5 / 5 (none see)

8,242 (56%)

Each row counts (brand, prompt) cells where N out of 5 engines failed to cite or mention the brand.81% of cells are missed by 3 or more engines, the brand isn't generally discoverable.

Each engine has a personality

Engines do not just disagree on rankings, they cite differently. Some confidently recommend, others hedge with mentions, others cite few brands per answer. Recognizing the personality of each engine tells you which engine to optimize for first if you can only target one.

Engine	Character	Cite rate	Mention rate	Mention share of visibility	Brands per answer
Claude	Broad recommender	22%	5%	18%	5.8
ChatGPT	Balanced	21%	3%	12%	5.5
Grok	Narrow scope	16%	3%	17%	4.4
Gemini	Tight gatekeeper	16%	3%	18%	4.9
Perplexity	Tight gatekeeper	13%	4%	25%	5.5

If a brand is invisible to 3 or more engines, the gap is foundational, fix the third-party citation infrastructure before chasing engine-specific tactics. If a brand is cited by ChatGPT but missed by Gemini, the gap is engine-specific, target Gemini's known retrieval signals. The mention share column flags brands AI knows about but does not recommend, that is a positioning problem, not a visibility problem.

Method: invisibility bucket counts (brand × prompt) cells with at least 4 engines reporting and tallies how many missed the brand entirely. Engine personality table aggregates over every (brand × prompt × engine) result. Mean brands per answer is bounded above by the analyzer's 6-competitor extraction cap, so the column compares relative behavior across engines rather than the true ceiling.

Does this depend on who's asking?

Findings 01-07 are computed from stateless API calls: no account, no chat history, no Memory, no custom instructions, no browsing personalization. Same prompt and same model produce the same distribution of answers regardless of who runs them, within sampling error.

But personalization itself is measurable. We tested it. See Finding 08 below for how much rankings shift when a user-style signal enters the prompt.

Finding 08

Personalization changes who AI recommends.

We tested it. 36 segments × 6 personas via API system-prompt variants (no consumer-app scraping). The leader brand changed in 32 of 36 segments under at least one persona. Mean top-3 overlap with baseline: 68%. skincare was the most volatile (20% overlap), smartphones the most stable (93%).

Leader flipped

32 of 36

segments

Mean top-3 overlap

68%

vs baseline

Most volatile

skincare

20% overlap

Most stable

smartphones

93% overlap

The leader brand by persona

Each cell is the brand cited most often across the 20 canonical prompts for that segment, when the AI engine is given that persona's system prompt. Cells highlighted orange flag a leader change vs baseline. The full ranked leaderboard per (segment × persona) is in the underlying JSON.

Segment	Baseline	Budget	Premium	Pro	First-time	Values	Top-3 overlap
home services	Roto-Rooter	Roto-Rooter	Terminix	Roto-Rooter	Roto-Rooter	Roto-Rooter	40%
fragrance	Chanel	Yves Saint Laurent	Creed	Chanel	Chanel	Le Labo	47%
laptops	ASUS	Lenovo	Apple Mac	Apple Mac	HP	Dell	53%
life insurers	Haven Life	Haven Life	Northwestern Mutual	MassMutual	Haven Life	MassMutual	53%
makeup	Fenty Beauty	Maybelline	Dior	MAC Cosmetics	Maybelline	Ilia	60%
skincare	CeraVe	The Ordinary	Tatcha	CeraVe	The Ordinary	The Ordinary	60%
banking	Ally Bank	Ally Bank	Citi	Ally Bank	Ally Bank	Aspiration	60%
luxury watches	Audemars Piguet	TAG Heuer	Audemars Piguet	Rolex	Omega	TAG Heuer	60%
haircare	Olaplex	Olaplex	Kérastase	Olaplex	Olaplex	Briogeo	67%
cruise lines	Royal Caribbean	Royal Caribbean	Disney Cruise Line	Royal Caribbean	Royal Caribbean	Disney Cruise Line	67%
online travel agencies	Airbnb	Booking.com	Booking.com	Booking.com	Booking.com	Booking.com	67%
athleisure	On Running	On Running	On Running	On Running	On Running	On Running	67%
men's fashion	Ralph Lauren	Uniqlo	Banana Republic	Ralph Lauren	Uniqlo	Ralph Lauren	67%
paint and fixtures	Sherwin-Williams	Behr	Benjamin Moore	Sherwin-Williams	Behr	Benjamin Moore	73%
home-improvement retailers	Home Depot	Lowe's	Home Depot	Home Depot	Home Depot	Home Depot	73%
hotels	Marriott	IHG	Ritz-Carlton	Hilton	Hilton	Marriott	73%
luxury and EV brands	BMW	BMW	BMW	Mercedes-Benz	Tesla	Tesla	73%
audio brands	Sony	Anker Soundcore	Sony	Sony	Apple AirPods	Sennheiser	73%
streetwear	Off-White	Off-White	Off-White	Off-White	BAPE	A.P.C.	73%
investing platforms	Fidelity	Robinhood	Charles Schwab	Fidelity	Fidelity	Betterment	73%
airlines	Delta	Spirit	Delta	Delta	JetBlue	JetBlue	73%
credit cards	Chase	Chase	American Express	Chase	Chase	Chase	80%
kids' fashion	Tea Collection	Old Navy Kids	Bonpoint	Tea Collection	Old Navy Kids	Mini Rodini	80%
health insurers	Blue Cross Blue Shield	Kaiser Permanente	Cigna	Blue Cross Blue Shield	Blue Cross Blue Shield	Kaiser Permanente	80%
toys	LEGO	Melissa & Doug	LEGO	LEGO	LEGO	Melissa & Doug	80%
fintech and neobanks	Chime	Chime	Chime	Chime	Chime	Chime	87%
mortgages	Rocket Mortgage	Rocket Mortgage	Rocket Mortgage	Rocket Mortgage	Rocket Mortgage	Rocket Mortgage	87%
used car retailers	Vroom	CarMax	Vroom	Vroom	AutoTrader	Carvana	87%
women's fashion	Everlane	Everlane	Everlane	Everlane	Everlane	Eileen Fisher	87%
smartphones	Samsung	Samsung	Samsung	Samsung	Samsung	Samsung	93%
TV brands	Samsung	TCL	LG	LG	TCL	LG	93%
tool brands	DeWalt	Ryobi	Bosch	Makita	DeWalt	Bosch	93%
mainstream auto brands	Honda	Toyota	Toyota	Honda	Toyota	Honda	93%
P&C insurers	State Farm	Progressive	Chubb	State Farm	Allstate	Progressive	93%
luxury goods	Gucci	Chanel	Gucci	Gucci	Chanel	Stella McCartney	93%
smart home brands	Amazon Alexa	Wyze	Philips Hue	Philips Hue	Amazon Alexa	Philips Hue	93%

Plan for the user-signal mix in your buyer base, not one ranking. Premium-skewed brands can be effectively invisible in baseline ChatGPT yet dominate when a premium-buyer signal enters the prompt. Budget brands flip the opposite way. If your buyer mix skews to one persona, the persona's leaderboard is the one that matters, not the baseline. The mapou Visibility Index is a baseline calibration; a persona-tuned MVI is a paid-engagement deliverable.

Method: 5 segments × 6 personas × ChatGPT × 20 canonical prompts (the same prompts used for the monthly MVI run). Personas are API system-prompt variants, fully visible and replicable. We do NOT scrape consumer chat apps; that violates ToS and breaks reproducibility. "Top-3 overlap" counts how many of the baseline top-3 brands stay in top-3 under each non-baseline persona, averaged across the 5 personas. "Leader holds" is true only if the #1 brand is the same as baseline for every persona. Note: "baseline" uses the shopping-assistant system prompt, not zero-frame neutral. We separately measured the framing effect (shopping vs minimal prompt) and disclose it on the methodology page. Run id: 2026-05-07-1625. Methodology version 1.0.

Finding 09

Engines disagree. Then they personalize differently.

The largest first-party AI personalization panel published. 36 segments × 5 engines × 12 buyer personas × 20 canonical prompts = 74,000 verdicts. Two findings stack: engines disagree on the right answer even before any persona signal (7 of 36 categories have 3+ distinct baseline leaders), AND each engine personalizes differently when a persona signal arrives. Mean top-3 overlap with baseline: 74% on the most stable engine (claude), 66% on the most sensitive (gemini). Same buyers, 8-point spread.

Most stable engine

Claude

74% top-3 overlap

Most sensitive engine

Gemini

66% top-3 overlap

Universal-agreement categories

3 of 36

all 5 engines pick same #1

Verdicts in panel

74.0k

brand × prompt × engine × persona

Per-engine personalization sensitivity

Engine	Mean top-3 overlap	95% CI	Leader held*	Most volatile	Most stable
Claude	74%	70–79%	2/22	hotels (47%)	toys (93%)
Perplexity	73%	68–78%	4/22	makeup (33%)	home-improvement retailers (100%)
Grok	69%	65–74%	2/22	paint and fixtures (40%)	mainstream auto brands (100%)
ChatGPT	68%	63–71%	4/36	skincare (20%)	smartphones (93%)
Gemini	66%	62–71%	1/22	makeup (40%)	investing platforms (100%)

95% CI computed via 1,000-iteration bootstrap on the per-(segment × persona) overlap observations (n=270 per engine). The 8-percentage-point spread between the most stable engine (Claude) and the most sensitive (Gemini) clears every pairwise CI. *Leader heldcounts segments where the same brand was #1 across all 10 buyer personas tested. With n=10, this distinguishes "held in our panel" from "mathematically unmovable". See methodology caveats.

Engines disagree even before personalization

Without any persona signal, the 5 engines do not converge on the same #1 brand in 7 of 36 categories. Three or more distinct baseline leaders across the panel.

used car retailers · 3 different #1 brands

ChatGPT: Vroom · Claude: Carvana · Gemini: CarMax · Grok: CarMax · Perplexity: Carvana

fragrance · 3 different #1 brands

ChatGPT: Chanel · Perplexity: Dior · Gemini: Chanel · Grok: Chanel · Claude: Tom Ford

makeup · 4 different #1 brands

ChatGPT: Fenty Beauty · Claude: Maybelline · Gemini: MAC Cosmetics · Grok: Maybelline · Perplexity: NARS

paint and fixtures · 3 different #1 brands

ChatGPT: Sherwin-Williams · Claude: Behr · Gemini: Sherwin-Williams · Grok: Benjamin Moore · Perplexity: Benjamin Moore

luxury goods · 4 different #1 brands

ChatGPT: Gucci · Grok: Louis Vuitton · Perplexity: Chanel · Claude: Hermès · Gemini: Hermès

hotels · 3 different #1 brands

Claude: Marriott · ChatGPT: Marriott · Perplexity: Four Seasons · Gemini: Ritz-Carlton · Grok: Marriott

online travel agencies · 3 different #1 brands

ChatGPT: Airbnb · Claude: Booking.com · Gemini: Booking.com · Grok: Booking.com · Perplexity: Expedia

The 3 categories where all 5 engines agree on baseline

haircare:Olaplex

skincare:CeraVe

home services:Roto-Rooter

Even in these "universal agreement" categories, persona signals fragment the leaderboard. Most extreme: home-services has all 5 engines agreeing on Roto-Rooter as the baseline leader, but produces 10 distinct leaders across the 60 (engine × persona) cells in our panel. Universal baseline agreement does not survive personalization.

Pick the engine your buyer uses, then optimize for the persona that matches your actual buyer mix. Optimizing for ChatGPT when your buyer uses Gemini is zero-impact work. Optimizing for the average buyer when your customer skews premium is a brand-equity miss. The combination of engine choice and persona signal compounds. Plan for both, not either-or.

Method: 36 segments × 5engines × 12 personas × 20 canonical buyer-intent prompts × 264 tracked brands. Personas are API system-prompt variants, fully visible and replicable. We do NOT scrape consumer chat apps; that violates ToS and breaks reproducibility. Top-3 overlap measures how many of the baseline top-3 brands stay in the top-3 under each non-baseline buyer persona, averaged across the 10 buyer personas (excluding the bare framing-isolation persona). Note: "baseline" uses the shopping-assistant frame; the bare-vs-baseline framing-effect study is disclosed on /methodology. Explore any brand × persona × engine cell in the Persona Explorer.

Finding 10

Persona signals shift different funnel phases differently.

Finding 09 showed engines personalize at different intensities. Finding 10 looks inside the funnel: which buyer signals move which phases the most? 18,665 per-prompt verdicts across 4 funnel phases × 5 buyer personas. The most volatile cell is Premium × Evaluation (32% leader-held); the most stable is Professional × Discovery (68%). The pattern: aspirational personas (premium, values-driven) move rankings hardest, especially at Evaluation. Practical personas (professional, first-time) preserve them.

Leader-held rate per (persona × phase)

Each cell is the share of (segment × engine) combinations where the baseline #1 brand stayed #1 under that persona at that funnel phase. Higher = persona doesn't move that phase. Lower = persona disrupts that phase. Color: green when ≥55% (resilient), orange when <45% (volatile), strong tints at the extremes.

Persona	Discovery	Filtered Discovery	Comparison	Evaluation	Mean
Budget	41%	61%	39%	43%	46%
Premium	36%	36%	44%	32%	37%
Professional	68%	54%	63%	64%	63%
First-time	60%	63%	62%	63%	62%
Values-driven	35%	36%	35%	41%	37%

What this tells you about each persona

Budget (46% mean leader-held)

Disrupts Comparison the most (39% held). Preserves Filtered Discovery the most (61% held).

Premium (37% mean leader-held)

Disrupts Evaluation the most (32% held). Preserves Comparison the most (44% held).

Professional (63% mean leader-held)

Disrupts Filtered Discovery the most (54% held). Preserves Discovery the most (68% held).

First-time (62% mean leader-held)

Disrupts Discovery the most (60% held). Preserves Evaluation the most (63% held).

Values-driven (37% mean leader-held)

Disrupts Discovery the most (35% held). Preserves Evaluation the most (41% held).

Different buyer signals matter at different stages of the funnel. Premium signals reshape who AI recommends at Evaluation, where buyers are committing. Professional signals barely move Comparison, where rankings are feature-driven. If your buyer skews premium, your Evaluation-phase content needs to be different from your Discovery content; the brands AI surfaces at the bottom of the funnel are not the ones it surfaces at the top. If your buyer skews professional, you can publish one consistent comparison page and trust AI to keep ranking it the same way across persona contexts.

Method: 13,035 per-prompt verdicts from the persona robustness panel (run 2026-05-05-1424 + 2026-05-05-1659 + 2026-05-05-2256 + extension runs). For each (segment, engine, persona, phase): rank brands by citation count within phase. Compare each persona's top brand to the baseline persona's top brand at the same (segment, engine, phase). Aggregate per (persona × phase) is mean across (segment × engine) of "leader held." The 4 phases (Discovery, Filtered Discovery, Comparison, Evaluation) come from the 20 canonical buyer-intent prompts per segment, distributed 6/6/4/4. The 5 buyer personas shown are the public-tier subset; full 10-persona analysis is in the paid panel via app.mapou.ai.

Finding 11

The shopping frame is not neutral.

Every other finding in this report uses a "shopping assistant" system prompt, the frame buyers experience inside ChatGPT shopping mode, Perplexity, and AI-search retail integrations. We tested what happens with a minimal prompt instead ("answer concisely"). Across 22 segments × ChatGPT × 440 prompts, removing the shopping frame flipped the #1 brand in 8 of 22 segments (36%) and dropped mean cite rates by 8 percentage points. Top-3 overlap stays high (82%), the consideration set is roughly the same, but who AI calls #1 changes. The implication for methodology: an "AI search visibility" number is always frame-conditioned. Picking the right frame for the question matters.

Leaders flipped (bare vs shopping)

8 / 22

36% of segments

Mean top-3 overlap

82%

consideration set is sticky

Cite-rate inflation under shopping frame

+8 pp

bare 27% → shopping 35%

Method

Same prompts

only the system frame changes

Where AI's #1 brand changes when the shopping frame is removed

Segment	#1 under shopping frame	#1 under bare frame
luxury and EV brands	Tesla	BMW
used car retailers	Vroom	CarMax
makeup	Fenty Beauty	NARS
skincare	The Ordinary	CeraVe
investing platforms	Betterment	Charles Schwab
paint and fixtures	Sherwin-Williams	Behr
home-improvement retailers	Lowe's	Home Depot
tool brands	DeWalt	Makita

Practical:if you measure your AI visibility on a non-commercial frame (a research chatbot, a knowledge-graph evaluation, a model-card test), the leaderboard you see is materially different from the one your buyers see inside ChatGPT shopping mode. mapou's baseline is the shopping frame because that is the frame consumers buy through. Vendor benchmarks that show different leaders are usually using a different frame, not measuring something we got wrong.

Method. Run id 2026-05-06-2027 (2026-05-08). For each of 22segments × ChatGPT, the same 20 canonical prompts were issued twice: once with the “shopping assistant” system prompt used everywhere else in MVI, once with a minimal prompt (“Answer the user's question concisely”). All other variables held constant. Leader = #1 brand by citation rate. Top-3 overlap = brand-name match. The 8 flipped segments are listed; the 14 unchanged segments kept the same #1 across both frames. The 7.8-percentage-point cite-rate inflation under the shopping frame is consistent across categories: shopping framing makes engines more confident citers, not just different ones.

Worked example

Skincare, end to end.

The findings above are abstract. Here's what they look like for a single segment, using the same May 2026 data. 20 brands tracked. Run the same lens on your category to find the equivalent picture.

Engine agreement

0.41

Divergence 0.59

Effective brands

11.4

Of 20 tracked · top 2 = 28%

Leader

CeraVe

MVI 68

Runner-up

La Roche-Posay

MVI 64

01. Engines disagree. Engine agreement of 0.41across the five engines. ChatGPT's top brand is not Gemini's top brand. Treat each engine as its own surface.

02. Concentrated. AI is effectively recommending 11.4 brands out of 20 tracked. CeraVe and La Roche-Posay alone take 28% of citations. The other 18 brands fight over the rest.

03. Discovery → Evaluation leak. Drunk Elephant loses 33percentage points between “best skincare” prompts and “which should I buy” prompts. AI surfaces the brand at discovery and drops it at the moment of decision.

04. Mentioned, not cited. La Mer is mentioned 44 times but cited only 6. AI uses it as a reference point in answers about other brands but rarely recommends it directly.

05. Kingmakers split. Claude decides Discovery, Grok decides Filtered Discovery, ChatGPT decides Comparison, ChatGPT decides Evaluation. A ChatGPT-only plan misses any phase the others kingmaker.

How to read this for your category: high divergence + few effective brands + a named leak brand is a market that's up for grabs at the moment of decision. Defending Evaluation surface (comparison content, head-to-head reviews, structured buying-advice answers) is where share moves.

The mapou playbook for a market like this: build evaluation-phase content tuned to ChatGPT (the kingmaker engine here), then densify third-party citations and structured buying-advice answers for the demand-leak brand so it stays in contention when AI moves from “best skincare” to “which one should I buy.” Track the lift monthly via the same prompt set.

Run a free check on your brand →See the full skincare report →

Every segment, every metric.

Across all 740 tracked brands · 36 segments

69%

Not yet cited. AI does not cite these brands for buyer-intent queries.

18%

First encounter. Discovered and cited occasionally, not consistently.

10%

Repeat use. Cited regularly enough to feel reliably present.

Default choice. The go-to recommendation in AI answers.

MVI tier distribution. 87% of tracked brands sit at first encounter or below, the recommendation set AI consistently surfaces is much narrower than the brand population.

One row per segment. Find your category. High divergence + few effective brands = a fragmented yet winner-take-most space, where retuning AI visibility can move real share quickly. Click any segment for the full leaderboard, per-engine breakdowns, and analyst note.

Segment	Brands	Engine divergence	Effective brands	Top-2 share	Top demand-leak	Top mention-only
Airlines	20	0.55	10.0	33%	Singapore Airlines (31 percentage points)	Breeze Airways (63%)
Athleisure & active lifestyle	20	0.32	6.6	43%	Vuori (40 percentage points)	Athleta (12%)
Audio (headphones, earbuds, speakers)	20	0.49	6.0	44%	Apple AirPods (16 percentage points)	Apple AirPods (14%)
Banking	20	0.25	8.5	34%	Axos Bank (13 percentage points)	Bank of America (50%)
Credit cards	20	0.32	6.3	42%	Citi (22 percentage points)	Bank of America (46%)
Cruise lines	20	0.57	9.9	32%	Viking (13 percentage points)	Celebrity Cruises (14%)
Fintech & neobanks	20	0.69	10.6	33%	Cash App (25 percentage points)	Venmo (69%)
Fragrance	20	0.63	12.1	27%	Dior (29 percentage points)	Maison Margiela Replica (53%)
Haircare	20	0.63	9.8	32%	K18 (28 percentage points)	Briogeo (13%)
Health insurance	20	0.18	8.4	36%	Kaiser Permanente (39 percentage points)	Independence Blue Cross (100%)
Home improvement retailers	20	0.35	5.4	54%	Menards (37 percentage points)	At Home (100%)
Home services	20	0.70	14.0	20%	TaskRabbit (27 percentage points)	Angi (74%)
Hotels	20	0.62	10.3	31%	Four Seasons (48 percentage points)	Ritz-Carlton (29%)
Investing & wealth	20	0.88	11.8	27%	Robinhood (27 percentage points)	Wealthsimple (94%)
Kids' fashion	20	0.53	13.2	22%	Old Navy Kids (26 percentage points)	Mini Rodini (60%)
Laptops	20	0.46	7.5	35%	Razer (21 percentage points)	Samsung Galaxy Book (100%)
Life insurance	20	0.48	12.9	24%	New York Life (38 percentage points)	Lincoln Financial (54%)
Luxury & EV OEMs	20	0.20	9.8	29%	Rivian (32 percentage points)	Volvo (91%)
Luxury goods	40	0.26	13.3	26%	Cartier (33 percentage points)	Vestiaire Collective (100%)
Luxury watches	20	0.45	11.6	27%	Cartier (37 percentage points)	Grand Seiko (43%)
Makeup	20	0.58	13.2	23%	Charlotte Tilbury (33 percentage points)	,
Mass-market OEMs	20	0.15	8.4	36%	Ram (29 percentage points)	Chrysler (56%)
Men's fashion	20	0.33	12.2	24%	Allbirds (18 percentage points)	Uniqlo (11%)
Mortgages & home loans	20	0.28	10.3	31%	loanDepot (45 percentage points)	NerdWallet (92%)
OTAs & travel booking	20	0.65	10.6	27%	Vrbo (42 percentage points)	Going (100%)
Paint, fixtures & finishes	20	0.24	13.5	26%	Andersen Windows (37 percentage points)	Behr (11%)
Property & casualty insurance	20	0.21	9.6	31%	Travelers (30 percentage points)	MetLife Auto & Home (100%)
Skincare	20	0.59	11.4	28%	Drunk Elephant (33 percentage points)	La Mer (88%)
Smart home & connected devices	20	0.66	10.9	30%	Ecobee (10 percentage points)	Apple HomeKit (63%)
Smartphones	20	0.46	7.3	38%	Motorola (21 percentage points)	OnePlus (21%)
Streetwear	20	0.58	11.4	27%	Corteiz (40 percentage points)	A.P.C. (100%)
Tool & hardware brands	20	0.20	8.2	35%	Ryobi (23 percentage points)	Stanley Black & Decker (95%)
Toys	20	0.67	7.4	44%	Fisher-Price (18 percentage points)	LEGO (9%)
TVs	20	0.49	5.5	46%	TCL (33 percentage points)	Roku TV (92%)
Used cars & auto retail	20	0.27	8.3	38%	AutoNation (33 percentage points)	Cars & Bids (100%)
Women's fashion	20	0.35	10.0	31%	Reformation (23 percentage points)	Zara (39%)

Implications

What the data implies for the next planning cycle.

Three implications drawn directly from the findings above. Each is observable in the panel data, not extrapolated from a single segment.

Persona-tuned visibility will replace the single-number AI visibility metric.

Categories where the leader brand changes under at least one buyer persona are the rule, not the exception, in our panel. The single-MVI-per-brand framing reads cleanly in a board deck and falls apart inside the data. The next planning cycle will bake persona-tuned visibility into the brief because that is the number tied to the actual buyer mix.

Engine selection becomes a phase-and-persona decision, not a market-share decision.

Mean cross-engine agreement sits below the threshold where one-engine optimization carries to the rest. Different engines decide different funnel phases, and the same buyer signal produces different stability across the 5 engines. Picking which engine to optimize for first comes from where in the funnel the category lives and which buyer mix the brand serves, not from which engine has the most users.

Frame conditioning will become standard methodology disclosure.

Removing the shopping-assistant system prompt flips the #1 brand in 8 of 22 segments and drops citation rates by 7.8 percentage points. Vendor benchmarks that publish different leaders are usually using a different frame, not measuring something the panel got wrong. Reports that do not disclose the frame they tested under will be discounted as the category matures.

On the record

Predictions for the next monthly run.

Falsifiable, dated, gradeable. Each prediction names a metric and a threshold. Next run we publish green or red against every claim, accumulating a track record we can be held to.

Prediction	Baseline	Threshold	Status
Mean engine agreement stays below 0.65 next run On the next monthly run (target 2026-06-01), the mean Spearman engine agreement across all 22 tracked segments will remain below 0.65. Engines will continue to disagree on brand ranking more often than they agree. Why we expect this: Cross-engine divergence is a structural property of how the five engines retrieve and synthesize. ChatGPT and Claude trained-knowledge, Perplexity and Grok web-grounded, Gemini blended. This split does not resolve in 30 days. We expect the mean to drift but stay clearly under 0.65.	0.538	≤ 0.65	Pending
ChatGPT remains Evaluation kingmaker in 18 or more segments On the next monthly run, ChatGPT will be the Evaluation-phase kingmaker engine (largest spread between most-cited and least-cited brand) in at least 18 of 22 segments. Decision-time visibility is structurally a ChatGPT problem. Why we expect this: ChatGPT is the kingmaker for Evaluation in 20 of 22 segments today. This reflects ChatGPT's training-data breadth and high token volume on comparative queries. We expect 1 to 2 segments to flip month-over-month due to engine updates, but the overall pattern to hold.	20	≥ 18	Pending
At least 18 of 22 segment leaders retain top-1 status On the next monthly run, at least 18 of 22 current segment leaders will retain top-1 MVI position. The data is stable enough at the leader tier that month-over-month flips are exception, not rule. Why we expect this: Default-choice tier brands (MVI 75+) tend to be entrenched: Chase in credit cards, Toyota in mainstream auto, Home Depot in home-improvement retailers, Booking in OTAs. Lower-MVI leaders in fragmented categories (haircare, cruise, fragrance) are more vulnerable to shuffle. We expect 2 to 4 flips, total leaders retained 18 to 22.	22	≥ 18	Pending
Skincare stays concentrated, CeraVe retains top-1 On the next monthly run, the skincare segment will retain effective-brand count at or below 6.0 (currently 5.4 of 12 tracked) and CeraVe will retain top-1 MVI position. Concentration is structural in this category. Why we expect this: Skincare is one of the most concentrated tracked segments (top 2 brands take 54 percent of MVI share). CeraVe leads with MVI 53 in a category where the runner-up sits at 47. Categories this concentrated rarely de-concentrate in 30 days, the citation infrastructure that produces the leader pattern is the same infrastructure that would have to change. Segment: skincare	5.4	≤ 6	Pending
Banking remains a high-agreement segment On the next monthly run, banking will retain mean engine agreement at or above 0.75 (currently 0.84). Banking is one of the segments where the five engines materially agree on ranking, and we expect that pattern to hold. Why we expect this: Banking has unusually consistent canonical sources (Chase, Bank of America, Wells Fargo, Capital One are referenced across nearly every comparative-finance article on the open web). This drives high cross-engine ρ. Categories with weak source consensus (fragrance ρ 0.36, cruise ρ 0.42) are the volatile ones, banking should be among the sticky ones. Segment: banking	0.84	≥ 0.75	Pending

Predictions are committed before the next run, not chosen post-hoc. Each is graded against the same fixed prompt set, with the result and methodology version stamped into the record. Disagree with any of these? Run your own check and bring data.

Failure modes · in progress

The fourth verdict we're building.

Every AI answer in this report is currently classified one of three ways: cited (recommended), mentioned (named without recommendation), or invisible (not surfaced). The implicit fourth category, which we are actively building toward v1.2, is misrepresented: the brand is named, but the AI's description of it is wrong.

The four failure buckets we screen for

Category drift. Brand placed in the wrong product space, e.g. a B2B tool framed as a consumer product, a paint brand framed as flooring.
Substitution errors. AI replaces the brand with a more canonical category leader, or attributes a competitor's specific product or feature to the subject brand by name.
Attribute hallucination. Specs, awards, partnerships, or statistics invented or misassigned to the brand.
Temporal drift. Discontinued products, defunct partnerships, or stale leadership presented as if current.

Why we are not publishing examples yet

Initial model-judged classifier passes (200 stratified verdicts, four-bucket prompt) flag candidates, but our manual verification rate on those flags is not high enough to publish. The classifier struggles to distinguish three things:

AI listing competitors alongside the subject brand in comparison answers, which is normal.
Subsidiary brands and co-brand cards that are correctly attributed (Tru by Hilton is Hilton, the Aeroplan card is a real Chase co-brand).
Subjective marketing language and possibly-true-but-unverifiable specifics, which look like fabrications without per-brand fact databases.

We will publish verified failures when the verification rate clears 80 percent on a sample. Until then, the methodology stays disclosed and the cell stays empty.

Track record matters more than headline claims. The classifier infrastructure (sample, prompt schema, persistence) is in place; what is not in place is the review pipeline that gets us to publishable confidence. We would rather ship a half-finished section that tells the truth about the gap than a polished section full of false positives.

Context

What this report is, and what it isn't.

AI answer engines are not the first surface to change how shoppers find brands, and they will not be the last. Reading these findings against prior search shifts and against the boundaries of what we measured keeps the takeaways honest.

Continuation, not reset

Same as before

Winner-takes-most surfaces, like featured snippets and knowledge panels before them.
Distribution still controlled by a small set of gatekeepers within their own ecosystems.
Structured, consistent entity identity wins over keyword density. Same lesson as Schema markup.
A one-time technical tax for brands slow to adapt. Same shape as mobile-first indexing in 2018.

Actually new

Multi-source synthesis instead of single-extraction. Citations are composed, not selected.
Probabilistic ranking, not deterministic positions. Identical query, different answer, same engine.
Brand presence without clicks. Influence happens upstream of the visit, not measured by it.

What this report does not measure

Calibrated to high-intent queries. All 20 canonical prompts target commerce decisions across Discovery, Filtered Discovery, Comparison, and Evaluation. Findings should not be generalized to navigational or broad informational search behavior, which still rewards traditional SEO.
Upstream influence, not downstream performance. MVI measures presence in AI-mediated discovery, analogous to impression share in early paid search. It is not a proxy for conversion lift, branded-search uplift, or assisted revenue, those require first-party clickstream and incrementality testing.
US English, this run. All API calls originate from US East with English prompts. Brand visibility almost certainly differs in other markets and languages.
Cited / mentioned / invisible only. We do not yet flag when an engine misattributes a product, confuses categories, or invents a brand. Failure-mode classification is on the v1.2 roadmap.
API surface, not personalized consumer chat. Findings 01-07 use stateless engine APIs. We do not directly capture how chatgpt.com or claude.ai behave for a logged-in user with Memory, custom instructions, or prior conversation context. Finding 08 measures the size of the personalization effect via API system-prompt variants: rankings shift meaningfully when a user-style signal enters the prompt.

Every measurement is a choice about what to count. These are ours, on the record. Read the framework →

Monthly research newsletter

Get the monthly MVI report. Eleven findings, fresh data, two minutes to read.

First Tuesday of each month. New canonical run, new emerging brands, what changed in AI search, and the MVI movers per category. Unsubscribe in one click.

We never share email addresses. One unsubscribe link in every send.

The MVIP

The 740 brands AI search is making decisions about right now. Either you're on it, or AI is recommending against you. Mapou builds custom Visibility Index readouts, integration-impact studies, and AI commerce strategy for retailers and consumer brands.

Run a free check →Talk to mapou

All five findings are computed live from the same pre-registered prompt set we run every month , snapshots are date-stamped so you can compare today's AI behavior to future runs. Methodology v1.0, Wilson 95% confidence intervals on every score · read the framework · per-segment reports · integration tracker.