Why 40 prompts and not 100?

Under 30 the variance swamps the signal; over 60 no small team will actually run the panel every week. Forty prompts — 10 per persona across four personas (discovery, compare, buy, rescue) — is the point where the weekly number is reliable and the run time stays under four hours end-to-end.

Why four engines and not just ChatGPT?

Citation behaviour diverges: ChatGPT quotes one to three sources per answer, Perplexity five to seven, Claude zero to two with aggressive synthesis, Gemini pulls heavily from Google surfaces. Different engines reward different signals, so measuring only one gives you a biased view of your own GEO maturity.

Can't I just use GA4's AI channel group?

GA4's out-of-the-box channel grouping puts AI traffic into Direct or Referral, and almost never labels it as AI. Build a custom channel group that includes chat.openai.com, perplexity.ai, claude.ai, and gemini.google.com as referrers, then cross-check the total against server logs. Even then, server logs are the truth source — GA4 is the revenue surface.

What's a healthy AI-CTR?

Across the stores we track, AI-CTR (fraction of cited prompts that generate a server-observed click to the cited page within the same session) sits between 2.4 percent and 5.1 percent. Under 2 percent usually means the cited page loads slowly or the PDP copy doesn't match the prompt. Over 6 percent usually means you're only being cited on very-high-intent queries — great for revenue, low ceiling for growth.

How do I make the scorecard actually land?

Two rules: (1) every dip and every jump gets a narrative reason annotated in the same sprint it happens. Without narrative, the chart is wallpaper. (2) Every week ends with exactly one 'next action' the team commits to — not three, not five. Commitment density is what separates a scorecard from a graveyard.

The weekly AI-citation measurement playbook

GEO without weekly measurement is vibes. The difference between “we think citations are up” and “citation share went from 22 to 27 of 40 this week because we shipped the standing-desks FAQ on Tuesday” is what separates a merchant who compounds from one who guesses. This is the exact weekly stack we run with Shopify merchants — what to measure, how to automate it, and what the Monday-morning Slack digest looks like.

The three layers that make measurement work

A measurement stack that holds up under a budget conversation needs three layers and only three. One synthetic layer (can we force a citation by asking the model directly?), one real-signal layer (what are bots actually fetching, and what are shoppers actually doing?), and one reporting layer (four numbers and a narrative). Skip one and the stack collapses — synthetic alone is theatre, real alone is reactive, reporting without both is vibes-as-a-service.

A three-layer measurement stack diagram. Layer one runs 40 shopper prompts weekly across ChatGPT, Perplexity, Claude, and Gemini. Layer two ingests server logs, citation scrape results, and first-party revenue. Layer three publishes a weekly scorecard with four KPIs: citation share, cited pages, AI click-through rate, and AI revenue. A right-hand column lists the tool stack: Surfient panel, GoAccess for logs, Shopify analytics with an AI order tag, and a Monday Slack digest. — Figure 1 — Three layers: synthetic prompt panel, real signal ingest, weekly scorecard. One Slack digest every Monday morning.

Layer 1 · The prompt panel

Forty shopper queries, split across four personas, run against four engines every Monday at 09:00 local time. Record whether you're cited, the exact cited URL, citation position, and who's cited instead of you when you aren't. The run takes about four hours end to end (mostly waiting for engines to respond); a growth lead reviews the results in roughly 45 minutes.

The 4 personas, 10 prompts each

Discovery (10) — 'best X for Y' and 'which X is best for Z'. The widest funnel queries where losing hurts most.
Compare (10) — 'X vs Y for Z' and 'is brand-X better than brand-Y'. Where brand entity recognition gets tested.
Buy (10) — 'where to buy X under $N' and 'cheapest X with feature Y'. Highest revenue-intent queries.
Rescue (10) — 'X broke, how do I fix it' and 'does X work with Y'. Post-purchase queries that drive repeat revenue.

Lock the 40 prompts for at least 90 days. You're measuring a moving target already (engine behaviour, competitor changes, index freshness) — if the prompt set shifts weekly you can't trust any trend line. Rotate prompts only at quarter boundaries.

Layer 2 · Real signal ingest

Synthetic panels tell you “do the models know you.” Real signal tells you “did it matter.” Three streams feed Layer 2 — server logs, citation scrape, and first-party revenue. All three run continuously; the Monday job just rolls them up for the week.

Server logs — the truth source

Grep your access logs for these user-agents and count clean 200 responses: GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot, Amazonbot, Applebot-Extended, CCBot, Google-Extended. Flag any 4xx/5xx — assistants that hit errors stop coming back quickly. Store the aggregate in a small BigQuery table or even a Sheet; the point is longitudinal, not real-time.

Citation scrape — who's cited when you are

The panel should capture not only whether you were cited but also the full list of URLs cited in the answer. This is where the competitive-intelligence layer lives. If Perplexity starts citing a competitor's blog post on every discovery query in your category, that's the signal to write a better one — not two weeks later, this Monday.

First-party revenue — the only number finance cares about

Tag outbound links wherever you can with utm_medium=ai and a source per engine (utm_source=chatgpt, etc.). Most models strip referrers, so this won't catch every session, but it catches enough to sanity-check the direct-traffic proxy. Then tag the Shopify order itself (source_ai tag) when the landing path came from an AI source — revenue attribution at the order level is the scorecard's keystone.

Layer 3 · The four-KPI scorecard

Four numbers. Always the same four. Monday 09:30 in Slack. Eight-week trailing chart on a single page the whole growth team sees. The four:

Citation share — number of the 40 prompts where your domain was cited by at least one engine. Primary KPI.
Cited pages — count of distinct URLs cited across the panel in the week. A diversity metric — 13 cited pages is healthier than 27 citations all going to one URL.
AI-CTR — server-observed clicks to cited pages within 30 minutes of a panel citation, divided by total citations. A retrieval-quality proxy.
AI revenue — Shopify orders tagged source_ai for the week. The finance-team number.

An 8-week chart of the four KPIs: citation share from 9 to 27 of 40, cited pages from 3 to 13, AI-CTR from 1.1% to 3.8%, and weekly AI revenue from $1,900 to $12,480, annotated with a week-4 schema regression dip and a week-6 FAQ-ship jump. Below is a Slack digest template with Wins, Losses, and Next action lines. — Figure 2 — Eight weeks of the scorecard. Annotate every dip and every jump with a narrative reason or the scorecard is just wallpaper.

The Slack digest that actually lands

Keep the format surgical: three lines, one channel, same time every week. The digest is a tool for producing decisions, not a storytelling venue. Anything longer than three lines gets skimmed.

Wins — the specific prompt(s) we newly cited for, and why (which ship caused it).
Losses — the specific prompt(s) we regressed on, and where the breakage likely is.
Next action — exactly one deliverable, one owner, one date. No more.

Tags:MeasurementAttributionCitation ShareShopifyPlaybook

The weekly AI-citation measurement playbook

The three layers that make measurement work

Layer 1 · The prompt panel

The 4 personas, 10 prompts each

Layer 2 · Real signal ingest

Server logs — the truth source

Citation scrape — who's cited when you are

First-party revenue — the only number finance cares about

Layer 3 · The four-KPI scorecard

The Slack digest that actually lands

Frequently asked questions

See how your Shopify store scores with AI engines

Sources & further reading

GSC is lying about your traffic — how to measure AI-era growth

The 90-day GEO onboarding blueprint

Why your Shopify store isn't in ChatGPT — 14-point diagnostic

Related reading

GSC is lying about your traffic

The 90-day GEO onboarding blueprint

Why your Shopify store isn't in ChatGPT

The weekly AI-citation measurement playbook

The three layers that make measurement work

Layer 1 · The prompt panel

The 4 personas, 10 prompts each

Layer 2 · Real signal ingest

Server logs — the truth source

Citation scrape — who's cited when you are

First-party revenue — the only number finance cares about

Layer 3 · The four-KPI scorecard

The Slack digest that actually lands

Frequently asked questions

See how your Shopify store scores with AI engines

Sources & further reading

Keep reading

GSC is lying about your traffic — how to measure AI-era growth

The 90-day GEO onboarding blueprint

Why your Shopify store isn't in ChatGPT — 14-point diagnostic

Related reading

GSC is lying about your traffic

The 90-day GEO onboarding blueprint

Why your Shopify store isn't in ChatGPT