How SMBs should measure AI visibility: benchmarks and interpretation

Back to Blog

This article compiles the best publicly available benchmarks and offers a framework for interpreting your own AI visibility numbers. It is not original Far & Wide research. Where exact SMB-specific numbers do not exist in public data, it offers qualitative framing and decision-tree logic instead of invented specifics. The goal: give you a way to read your own score, not a table to compare yourself against.

If you are a 20-person SaaS, a regional e-commerce brand, or a professional services firm under €10M ARR, the Conductor enterprise reports and Fortune 500 citation benchmarks do not apply to you. Use the framework below to interpret what your numbers mean.

Headline takeaways

A first measurement is a starting line, not a grade. Brands new to AEO typically start with very low single-digit visibility. The goal of year one is movement from zero to measurable — not a specific score.
Citation rate and recommendation rate diverge. A brand mentioned in 40% of responses but recommended in only 5% has an entity or positioning problem, not a mention problem. These two metrics must be measured separately.
ChatGPT and Perplexity are not the same game. Perplexity's citation overlap with Google is high (~70%); ChatGPT's is low (~35%). Graphite's cross-dataset analysis confirms this. High Perplexity visibility correlates with your Google SEO. ChatGPT visibility does not.
Community platforms dominate AI citations. Independent source-mix analyses covered by SearchEngineLand and other trade press consistently show Reddit and YouTube among the top-cited domains across all major AI platforms. If your SMB is invisible in community content, you are invisible in a large chunk of the citation pool by default.
Sampling noise is real. Independent measurement research finds that roughly 10 responses per prompt are needed to stabilize a visibility measurement. A single run of a single prompt is not a stable measurement. Details in the volatility section.

Why SMBs need a different frame

AI visibility benchmarks published by Conductor, Digital Bloom, and most industry reports focus on enterprise brands — Fortune 500 companies with decade-old entity graphs, hundreds of press mentions, and dozens of directory listings. A 20-person SaaS reading “average Share of Voice in the project management category is 34%” is comparing itself to Asana and Monday. That comparison is not useful.

AI Share of Voice is the percentage of AI-generated responses that mention or recommend your brand out of all responses for queries relevant to your category (canonical definition). For enterprise brands with saturated entity signals, the ceiling is higher and the floor is higher. For an SMB under €10M ARR, the realistic band is narrower, the timeline is longer, and the key mechanism is different: you are building parametric knowledge from zero, not defending market share.

Good AI visibility for an SMB is not the same number as good AI visibility for an enterprise. Enterprise benchmarks tell you how Salesforce performs. SMBs need a different interpretation frame — one built around movement (zero to measurable) and diagnosis (which of the three visibility layers is weakest), not absolute thresholds.

Sources and scope

Sources. Public benchmarks from neutral publishers — Conductor (via Digiday), the Ahrefs YMYL study, OpenAI's own usage paper, Nectiv via SearchEngineLand, SparkToro, Princeton/Meta's GEO research (arXiv), and directional observations from operator practice. Where public SMB-specific figures exist, they are cited inline. Where they do not, this article offers qualitative framing and decision-tree logic.

This is not a statistical study. The tiered thresholds below (“Good / Great / Excellent”) are an editorial framework for interpreting your own score against realistic year-one expectations, not empirical percentiles derived from a sampled dataset. Treat them as guideposts, not measurements.

Platforms referenced: ChatGPT, Claude, Perplexity, Google AI Mode / AI Overviews, using public citation overlap and volume data from the sources above.

What this article will not give you: an exact score to compare yourself against. There is no credible “average SMB SaaS Share of Voice” number in public data. What it will give you is a way to interpret your own measured score — whether it signals a parametric knowledge gap, an entity consistency problem, a positioning gap, or simply a starting point.

Key metrics, defined

Four metrics matter for SMB AI visibility. Using canonical definitions:

AI Share of Voice. The percentage of AI-generated responses that mention or recommend your brand out of all responses for queries relevant to your category. Formula: (Number of AI responses mentioning your brand ÷ Total AI responses for your target queries) × 100.

AI Citation. When an AI assistant quotes, references, or links to your content as a source in its response. Different from a recommendation — a citation uses your content as evidence.

AI Recommendation. When an AI assistant names your brand as a solution to a user's problem (e.g., “Try [Brand] for this use case”). Stronger than a citation because it directly influences purchase decisions. AI typically recommends 3-5 brands per response — ChatGPT narrowed its set from 6-7 to 3-4 brands per answer with its October 2025 entity update (Profound, 2025).

Citation Frequency. In AI-generated answers, the number of distinct citation sources that mention your brand matters more than being the single top-ranked source. A brand cited by 8 different sources outperforms a brand cited by only 1 source, even if that single source ranks #1 on Google.

Three-Layer Visibility Model. Far & Wide's framework for understanding AI visibility as three distinct layers: Layer 1 (parametric knowledge — what the model knows from training), Layer 2 (web search with user context), and Layer 3 (web search without context / fresh sessions). Each layer requires different optimization strategies and different timelines.

When interpreting your own data, track Share of Voice, Citation Rate (mentions ÷ total responses), and Recommendation Rate (named as solution ÷ total responses) separately. They are not interchangeable.

What “zero to measurable” means in year one

The most common mistake SMBs make is benchmarking against the wrong number. A brand that had zero parametric knowledge six months ago and now appears in 4% of category responses is not failing — it is on pace. A brand that was at 25% six months ago and is now at 20% is failing, even though its absolute number is higher.

The right year-one frame is movement, not score. The questions are:

Did we go from unmeasurable to measurable on our priority queries?
Is the trend line up quarter-over-quarter (with a minimum of 10 responses per prompt to control for sampling noise)?
Did the gap between Citation Rate and Recommendation Rate narrow or widen?

Absolute thresholds matter more once you have 12+ months of data and a stable baseline. For a first measurement, diagnosis matters more than grade.

Industry context: what to expect by industry

Different industries produce different baselines because query patterns and YMYL scrutiny change how AI platforms respond. Healthcare and finance topics trigger AI Overviews more often: the Ahrefs YMYL study (November 2025) found medical YMYL queries trigger AI Overviews 44.1% of the time, the highest of any YMYL category. AI platforms also apply stricter authority filters to YMYL content — credentials shift recommendation rate more than citation rate.

Qualitative patterns by industry (drawn from operator practice, not a sampled study):

Industry	Dynamic
SaaS	Category crowded; entity signals decisive. Listicle placement and schema are high-leverage.
E-commerce	Product-level visibility diverges from brand-level. A brand can be invisible while its hero product is cited.
Professional services	Local + entity mix; review-platform signals heavily weighted.
Healthcare / wellness	YMYL gate. Credential markup (Person schema with credentials) is required for recommendation.
Agencies	Case studies with outcome metrics (not testimonial quotes) shift recommendation rate more than citation rate.

How to use this table. These are directional patterns, not benchmarks. Use them to ask the right diagnostic question for your industry. A healthcare SMB with 10% Share of Voice is in a different position than a SaaS SMB with the same number: the healthcare brand likely needs E-E-A-T work, the SaaS brand likely needs schema and listicle placement.

Platform context: how AI platforms differ

How much your existing SEO work carries into AI visibility depends heavily on platform. Public data shows the split clearly:

Platform	Google citation overlap	Primary citation sources	SEO spillover
ChatGPT	~35% (Graphite, thousands of queries)	Parametric-first; mixed: directories, listicles, community	Low
Claude	Lower than Perplexity (public data thin)	Similar to ChatGPT; narrower entity pool	Low-medium
Perplexity	~70% (Graphite)	Retrieval-first; Reddit is the single largest citation source	High
Google AI Mode / AI Overviews	Very high (uses Google index)	Google index, Knowledge Graph	Very high

Sources: Graphite citation overlap analysis (2026), SearchEngineLand AI referral coverage, independent source-mix analyses covered in industry trade press on Reddit dominance in Perplexity citations.

How to interpret: If your Perplexity visibility is strong and your ChatGPT visibility is weak, your SEO is working and your entity graph is not. If your ChatGPT visibility is strong and your Perplexity visibility is weak (rare), you are winning in community and non-SEO sources but missing the SEO foundation. These are two different remediation paths.

The gap between citation and recommendation is the most useful diagnostic

The most actionable piece of AEO data is not any single rate — it is the gap between your Citation Rate (how often you are mentioned) and your Recommendation Rate (how often you are named as a solution). They measure different things and fail for different reasons.

Pattern	What it means
High citation, high recommendation	Entity is strong, positioning is clear
High citation, low recommendation	AI knows you exist but does not frame you as a solution
Low citation, occasional recommendation	Positioning is clear in a few sources; entity graph is thin
Low citation, low recommendation	Parametric knowledge gap — brand is effectively invisible

How to read this. If you are mentioned in 40% of responses but recommended in only 5%, do not spend effort on more mentions. Fix positioning in third-party listicles — content that names solutions, not just topics. If you are at 3% citation and 0% recommendation, mentions are the priority. No positioning work will help until AI platforms know the brand exists.

A tiered framework for interpreting your score

The thresholds below are an interpretive framework, not empirical percentiles. They describe what each tier of visibility tends to mean for an SMB, based on the mechanics of how AI systems select brands. Use them to read your own score, not to claim a percentile.

AI Share of Voice (your category)

Starting tier. Single-digit Share of Voice is the normal starting band for a brand new to AEO or newer than recent model training cutoffs. The question at this tier is not “how do we grow?” — it is “which of the three visibility layers is weakest?”
Building tier. Low double-digit Share of Voice means the brand is now consistently entering the consideration set for some category queries. Competitors are encountering you in AI-sourced research. Time to diversify: multi-platform coverage matters more than a bigger number on one platform.
Category presence tier. Higher double-digit Share of Voice means AI platforms treat the brand as one of the defaults for the category. At this level, the work shifts from visibility to defense — volatility (see next section) becomes the primary risk.

Citation Rate (mentions ÷ responses). Moves roughly in step with Share of Voice but measures a different thing: how often your content or brand appears as a reference, regardless of whether you are recommended. A brand with high citation and low recommendation is cited as topical background, not as a solution.

AI Recommendation Rate (named as solution ÷ responses). Capped by the 3-5 brand ceiling per response (ChatGPT: 3-4 after October 2025). Every brand named as a solution consumes a slot; low Recommendation Rate with high Citation Rate means you are topical content, not a named option.

Cross-platform consistency. If the same brand appears across ChatGPT, Claude, and Perplexity on the same query, the entity graph is strong enough to survive platform differences. If you appear on one but not the others, entity signals or source mix differ between platforms. Research on AI visibility consistency (Far & Wide, 2025, 1,000+ AI sessions) found that brands appearing consistently across platforms share three characteristics: strong entity consistency, multi-platform presence, and recency across sources.

Use your own score as a decision tree

Most SMBs do not need more metrics — they need to know which problem to solve first. The pattern below works as a decision tree.

Below 5% on all metrics across all platforms. Parametric knowledge problem. Either the brand is newer than the model's training cutoff, or the brand is not mentioned in the sources AI trains on (Wikipedia, Reddit, major press, industry reports). Remediation: Wikipedia entry where notability applies, Reddit/community presence, major press mentions. Timeline is long — Layer 1 updates happen on model retrain cycles (3-12 months).

High mention, low recommendation. Entity signals exist but positioning is weak. AI knows you exist but does not frame you as a solution for the category. Remediation: third-party listicle placement (“best X for Y” articles where the brand is named as a solution, not just a passing mention), case studies with outcome metrics, comparison content that positions the brand against specific alternatives.

High ChatGPT, low Perplexity. Content does not meet Perplexity's Reddit and citation bias. Remediation: Reddit presence in your category, content structured for extraction (comparison tables, answer-first paragraphs, bold keyword + explanation patterns).

High Perplexity, low ChatGPT. SEO is working but entity graph is thin. Remediation: schema markup (Organization with sameAs), Wikipedia/Wikidata entry, consistent brand description across all third-party mentions, directory listings with exact-match NAP/entity data.

Inconsistent across sessions (brand appears in 3/10 responses, not the same 3 responses each time). Entity disambiguation problem. AI cannot reliably identify the brand as the same entity across retrievals. Remediation: sameAs schema links pointing to LinkedIn, Crunchbase, Wikipedia; consistent naming across all sources (“Far & Wide” not “Far and Wide” or “FarAndWide”).

Zero parametric knowledge (web-search-off returns nothing or hallucinations). No training-data presence. Remediation: wait for the next model training cycle and spend the waiting period feeding the sources AI trains on. This is a 3-12 month investment, not a 30-day fix.

Flat Share of Voice month-over-month. If you are publishing content and making structural changes but Share of Voice is not moving, the issue is entity signals, not content. AI has enough content about your category — it does not have enough confident entity signals about your brand. Schema, sameAs links, and Wikipedia take priority over more articles.

Volatility warning: do not panic at month-over-month drops

AI citation rates drift substantially month-over-month across major AI models. A Share of Voice drop from 18% in January to 11% in February looks catastrophic — in most cases it is model noise, not a content problem.

Three patterns cause normal volatility:

Model updates. New model versions re-rank entities. ChatGPT's October 2025 entity update reduced the number of brands named per answer from 6-7 to 3-4 (Profound, 2025). Every brand outside the top 3-4 for a query saw recommendation rate drop regardless of their content.
Sampling variance. Because LLMs sample from a probability distribution, the same prompt run twice produces different outputs. Graphite's 2026 randomness study (200 prompts × 400 responses each) found that 10 responses per prompt are required for Mean Absolute Error ≤ 10% in 98.6% of prompts. Single-prompt single-run data is not a stable measurement.
Index refresh cycles. ChatGPT and Perplexity index web content on different schedules. A page published on Monday may enter one index in 48 hours and another in three weeks.

How to measure without misreading noise. Measure quarterly for trend, not monthly. Compare year-over-year, not week-over-week. Use at least 10 responses per prompt per scenario. Keep the prompt set fixed — changing prompts is a bigger source of variance than any real change in visibility.

Action items, ranked by impact

Once you know where you sit on the decision tree above, the fix list narrows. Below, actions ranked by typical impact on SMBs, highest-leverage first.

If below-benchmark mention rate → add brand entity signals. Implement Organization schema with sameAs links pointing to LinkedIn, Crunchbase, official social profiles, and Wikipedia (if present). Register consistent NAP data across directories. This is the foundation — without it, other fixes compound weakly. The Princeton/Meta GEO study (arXiv, 2023) found that authority citations and statistics increased AI visibility by 30-40%; keyword stuffing produced a -6% change.

If mention without recommendation → fix positioning in third-party listicles. Pitch for inclusion in “best X for Y” listicles in your category. Build review-platform profiles with structured data (ratings, use-case tags). Publish case studies with outcome metrics (not testimonial quotes, numbered results). This is positioning work, not content work.

If below-benchmark on Perplexity → community presence. Perplexity's citation mix ranks Reddit as its single largest source in independent analyses. For SMBs, this means real participation in category-relevant subreddits over 3-6 months. Not promotional posts — answers to category questions, visible brand association.

If inconsistent across sessions → entity disambiguation. If the brand name is ambiguous (shares words with other entities), strengthen sameAs links, add LocalBusiness or Organization schema specifying industry and category explicitly, and ensure the first paragraph of your homepage contains a self-contained brand definition.

If zero parametric knowledge → wait and feed the training sources. Layer 1 visibility updates only when models retrain. Spend the waiting period getting mentioned in the sources that feed training data: Wikipedia (if notable), major industry publications, Reddit, high-authority press. This is a 3-12 month horizon. Do not expect 30-day results from parametric-level work.

What this framework does not tell you

Three things this framework cannot answer:

Your exact score versus category competitors. Category-level Share of Voice is only meaningful against a specific prompt set and a specific competitor list. Generic “SaaS SMB range” numbers are a starting frame, not a substitute for testing your actual queries.
Which queries matter most for your revenue. A brand with 40% SOV on low-intent informational queries and 2% SOV on buying-intent queries has a revenue problem, not an AEO win. Query selection matters more than the aggregate number.
Whether AI mentions convert for your business. AI drives approximately 1.08% of total web traffic across 10 industries, with ChatGPT responsible for roughly 87.4% of AI referral traffic (Conductor, reported via Digiday, 2025; SearchEngineLand coverage). LLM traffic has been reported to convert materially better than Google search traffic in publisher-side analyses — SearchEngineLand (2025) reports ChatGPT e-commerce traffic converting 31% higher than non-branded organic. Conversion is measured per-brand and per-offer, not at a benchmark level.

Get your own numbers

This framework tells you how to read a score. It does not tell you what your score is. For that, you need a measurement against your actual customer queries and competitors — not a generic benchmark.

Related reading: