Here is the gap that makes prompt research worth doing at all. ChatGPT reached 700 million weekly active users by July 2025 — roughly 10% of the global adult population, sending 18 billion messages a week. Non-work messages grew from 53% of all usage in June 2024 to 73% in June 2025, and nearly half of all messages now come from users under 26 (Chatterji et al., OpenAI/Duke/Harvard, How People Use ChatGPT, September 2025). Real-world use is personal, decision-seeking, and younger than the formal enterprise framing most marketers plan for. The prompts you imagine are not the prompts your customers type. You cannot optimize for prompts you do not know.

Why prompt research matters

AI Share of Voice depends on the prompts you optimize for. AI Share of Voice is the percentage of AI-generated responses that mention or recommend your brand out of all responses for queries relevant to your category (formula: number of AI responses mentioning your brand divided by total AI responses for your target queries, multiplied by 100). If your target query list is full of marketer-imagined prompts that nobody types, your measured AI Share of Voice is a fiction. You are scoring yourself on the wrong exam.

Real prompts are longer, more personal, and more specific than marketers assume. OpenAI's own classification of ChatGPT conversations found that 49% of messages are “Asking” — seeking information to make a decision — and 40% are “Doing” — asking the model to produce output. The share of “Seeking Information” alone grew from 14% to 24% of all usage in a single year (Chatterji et al., OpenAI/Duke/Harvard, How People Use ChatGPT, September 2025). Independent industry analyses of real prompt corpora consistently show user prompts cluster at 10-20 words and heavily use first-person framing. Users write “I run a 12-person consulting firm in Amsterdam, what CRM should I pick under 200 euros a month” rather than “best CRM.” The AI then generates its own internal sub-queries to answer. AI systems increasingly issue longer internal sub-queries to fulfill single user prompts — short head terms miss both layers.

Two separate outcomes, two separate measurements. Prompt research feeds two distinct goals:

AI Visibility (Brand Visibility in AI Responses): the percentage of AI responses in which your brand appears for a given prompt. A citation, a mention, or a named-in-passing reference all count as visibility.
AI Recommendation: when the AI actively names your brand as a solution (“Try Linear for this use case”). A brand can be visible without being recommended, and recommended without being cited.

Prompts that trigger visibility are not always prompts that trigger recommendation. You need both lists, measured separately.

The 7 methods to discover real customer prompts

Methods ranked by effort-to-value ratio. Start with method 1. Layer additional methods as your prompt library grows.

1. Interview 10-15 customers with exact phrasing

Ask customers how they would search, not how they would describe what they want. The phrasing matters. “If you had to ask an AI assistant to find something like what we sell, what would you type?” gets you a prompt. “What do you like about our product?” gets you marketing copy. Record the exact wording, including typos and filler words. Do 10-15 interviews — you will stop seeing new patterns around interview 12.

Keep a short script:

What were you trying to solve when you first looked for a tool like this?
If you had to ask ChatGPT today, word-for-word, what would you type?
What would you type if the first answer did not help?
What did you almost type but decided was too long?

Interviews are the only method that captures long-form, first-person prompts because customers describe their real situation (“I/my/me”) rather than editing themselves into keyword-ese. This matches OpenAI's own finding that nearly half of ChatGPT messages are decision-seeking “Asking” prompts, not formal queries.

2. Mine sales call transcripts

Pull the verbatim questions from the first 5 minutes of every discovery call. Sales tools with automatic transcription (Gong, Fireflies, Chorus, Grain) give you searchable transcripts. Filter for question-shaped patterns: “how do you”, “what is the difference between”, “is it possible to”, “do you support”. Export the questions, de-duplicate, and rank by frequency.

Sales call questions are commercial-intent gold. A prospect who asked “how does your pricing compare to Zendesk for a 40-person support team” is the exact user who would type that into ChatGPT a week earlier. Same person, same question, different channel.

3. Analyze support tickets and chat logs

Export the last 500-1,000 support tickets and cluster them by topic. Zendesk, Intercom, Freshdesk, HelpScout, and most help desks let you export ticket subject lines and first-message bodies. Pipe into a spreadsheet or feed 200 at a time into ChatGPT with “cluster these into 15 themes and show me 3 example phrasings per theme.”

Support tickets reveal the prompts your customers ask after they buy, which is the same phrasing new buyers use before they buy. “How do I connect Shopify to QuickBooks” from an existing customer is “how to connect Shopify to QuickBooks” from a prospect.

4. Mine Google Search Console for long-tail queries

Filter GSC queries to 10+ words and export everything with 5+ impressions. Google Search Console still shows the actual queries users type into Google. The long-tail (10-word and 15-word) queries are the closest thing you have to ChatGPT prompts: same user, slightly different channel, same wording style. Filter:

Queries with 10+ words
Queries containing “how”, “what”, “vs”, “best”, “for”
Queries with first-person pronouns (“my”, “I”, “me”)

Pair the GSC export with whichever pages are ranking. The mismatch between query and page tells you which prompts you are losing to competitors.

5. Scrape Reddit and Quora by subreddit

Pull thread titles from the 3-5 subreddits where your customers hang out. Reddit is the most-cited single domain across AI search engines — analysis of 30 million citations found Reddit ranked first across ChatGPT, Perplexity, Google AI Mode, Gemini, and AI Overviews, followed by YouTube and LinkedIn (SearchEngineLand, March 2026). Reddit thread titles are the single best real-prompt corpus on the open web because Redditors ask full-sentence, context-loaded questions — and because those threads are already in the retrieval graph for your prompts.

Practical method:

List 3-5 subreddits where customers post (r/smallbusiness, r/SaaS, r/marketing, plus your niche).
Use Reddit's search or a tool like GummySearch to pull thread titles containing your category keyword.
Filter titles to question-shaped ones (start with who/what/when/where/why/how/is/are/can/should).
Add 20-40 per subreddit to your prompt matrix.

6. Use AI platform autocomplete and related queries

Type partial prompts into ChatGPT and Perplexity and record the suggestions. ChatGPT's suggested follow-up prompts, Perplexity's related questions, and Google AI Mode's “people also ask” blocks all surface the queries the platforms have seen at scale. Start a fresh session (logged-out or incognito). Logged-in sessions bias toward your personal history. Note: an analysis of 8,500 prompts across nine industries found ChatGPT triggers a web search on only 31% of prompts on average, with prompt intent being the strongest driver — local-intent prompts trigger search 59% of the time, credit-card prompts just 18% (SearchEngineLand, October 2025). Logged-out autocomplete therefore reflects a different retrieval pattern than logged-in suggestions.

Record 20-30 suggestions across your top 5 seed prompts. Add them to the matrix with a source tag of “platform autocomplete” so you know how they were discovered.

7. Reverse-engineer competitor visibility

Find the prompts where your competitors appear, then find why. Pick 3-5 competitors. Run 20-30 seed prompts across ChatGPT, Claude, and Perplexity. Where competitors get mentioned and you do not, that prompt belongs in your matrix. This method finds prompts you would never have guessed because they live in the AI's answer pattern, not in your customer's mouth.

Pair this with a peek at what those competitors' sources are. If Perplexity cites a specific Reddit thread when recommending Competitor X, that Reddit thread is a prompt-discovery signal for you.

Build a prompt matrix

A prompt matrix is a spreadsheet with one row per prompt and one column per platform, plus columns for response analysis. This is the single artifact that turns prompt research from a scattered notes exercise into an optimization asset.

Minimum columns:

Column	Purpose
Prompt (verbatim)	The exact wording, typos included
Source	Interview / sales call / GSC / Reddit / autocomplete / competitor
Intent category	Research / comparison / recommendation / evaluation / purchase
Word count	Filter for long-tail (15+)
First-person?	Flag if contains I/my/me
ChatGPT — brand appears?	Y/N
ChatGPT — position	1, 2, 3...
Claude — brand appears?	Y/N
Perplexity — brand appears?	Y/N
Google AI Mode — brand appears?	Y/N
Competitors named	List
Sources cited	URLs
Commercial intent (1-5)	Scoring column
Estimated volume (1-5)	Scoring column
Priority score	Computed

Run each prompt 10 times per platform. AI responses are stochastic — the same prompt returns different brand mixes across runs, and a single run is not a measurement, it is a sample. Ten runs per prompt is the widely-used minimum to stabilize the mean brand-mention rate; fewer than that and the noise swamps the signal. Verify the ten-run signal yourself by running your top prompt ten times back-to-back in a fresh logged-out session.

Categorize prompts by intent

Assign one of five intent categories to every prompt. Different intents map to different content assets, different conversion actions, and different competitors.

Research intent: “what is [category]”, “how does [thing] work”, “why do companies use [category]”. User is learning. Content asset: definition pages, category explainers, glossary entries.
Comparison intent: “[X] vs [Y]”, “alternatives to [X]”, “X compared to Y for [use case]”. User is in a shortlist. Content asset: comparison pages, vs pages, alternatives pages.
Recommendation intent: “best [category] for [audience]”, “which [tool] should I use if I [constraint]”, “recommend me a [category]”. User wants a named answer. Content asset: best-of pages, category directories, review aggregators.
Evaluation intent: “is [brand] worth it”, “[brand] review”, “does [brand] support [feature]”, “pros and cons of [brand]”. User has your name and is checking. Content asset: feature pages, case studies, FAQ, review platform presence.
Purchase intent: “where to buy [brand]”, “[brand] pricing”, “[brand] discount”, “how to get started with [brand]”. User is ready. Content asset: pricing page, signup flow, docs.

Most brands obsess over recommendation-intent prompts and under-invest in research and comparison. That imbalance is why they are visible to users who already know their name and invisible to users who do not.

Prioritize which prompts to optimize for

Score every prompt on three axes and compute a priority number. Priority = estimated volume (1-5) × commercial intent (1-5) × competitive gap (1-5). Optimize the top 20-30 prompts first.

Volume is directional — exact AI search volume is unknowable. Use GSC impressions for the closest equivalent query as a proxy. Commercial intent is highest for recommendation, comparison, and purchase intents; lowest for research. Competitive gap is 5 when your brand is absent in 10/10 runs, 1 when you already appear in 9/10.

Example scoring table:

Prompt	Volume	Commercial	Gap	Priority
best CRM for 10-person consulting firm under $200	5	5	4	100
HubSpot vs Pipedrive for consulting agencies	3	5	5	75
how do I pick a CRM if I only have 3 salespeople	4	4	4	64
what is a CRM	5	1	1	5

The “what is a CRM” prompt has massive volume but near-zero commercial intent and near-zero competitive gap (you are already cited), so it scores 5. The 10-person consulting prompt has smaller volume but maximum commercial weight and a real gap, so it scores 100. That is where optimization time goes first.

Visibility vs recommendation: the key distinction

A prompt that makes AI mention your brand is not the same as a prompt that makes AI recommend your brand. Two separate behaviors, two separate optimization targets, two separate measurements.

AI Visibility measures: did my brand appear anywhere in the response? A citation link, a passing mention, a comparison table row all count.
AI Recommendation measures: did the AI actively name my brand as the suggested solution? “I recommend Linear” or “Try Notion for this” is a recommendation.

You can be cited 10 times in a prompt where the AI recommends three competitors. You can also be the sole recommendation on a prompt where your website is never cited — the AI pulled the recommendation from parametric knowledge or from third-party sources.

Track both. A prompt matrix should have one column for visibility (yes/no) and a separate column for recommendation (yes/no). The prompts where you are visible but not recommended tell you to strengthen recommendation signals: reviews, third-party listicles, comparison content. The prompts where you are neither tell you to strengthen base presence: AI crawler access, schema, entity consistency.

Build a repeatable prompt research workflow

Prompt research is a workflow, not a dashboard. The reason most teams stall is that they buy a tracking product before they know which prompts to track. Build the workflow first. Tooling follows.

A workflow you can run yourself, in roughly 3-5 hours per cycle:

Schedule interviews (week 1, 2 hours). Book five 20-minute calls with recent buyers. Use the four-question script from Method 1. Record verbatim phrasings in a single notes document.
Export query data (week 1, 30 minutes). Pull Google Search Console queries with 10+ words and 5+ impressions. Export the last 500 support tickets from your help desk. Export the last 30 days of sales call transcripts from your call-recording tool.
Mine community threads (week 1, 1 hour). Pick three subreddits your customers post in. Pull the top 50 question-shaped thread titles per subreddit using Reddit's built-in search (title:"question keyword" subreddit:targetsub). Add 20-40 per subreddit to your notes document.
Consolidate into a matrix (week 1, 1 hour). Open a spreadsheet. One row per prompt. Columns for the prompt, source, intent category, word count, first-person flag, ChatGPT/Claude/Perplexity/AI Mode visibility, recommendation, competitors named, sources cited, volume score, commercial score, gap score, priority.
Score and prioritize (week 1, 30 minutes). Fill volume × commercial × gap. Sort descending. Top 20 prompts are your tracking set.
Run the tracking set (week 2, 2 hours). Run each of the top 20 prompts 10 times in ChatGPT, then Claude, then Perplexity. Record visibility and recommendation in the matrix. Use a logged-out or incognito browser session.
Re-run monthly (ongoing, 2 hours/month). Re-run the same 20 prompts in month 2. Diff the results. That diff is your AI Share of Voice delta.

Artifacts you need: interview script, CSV prompt matrix template, one logged-out browser profile per platform. That is the complete toolkit. No subscription required to execute this workflow end-to-end.

Why this works. Every prompt comes from a real customer signal (interview, ticket, call, thread, query). Every prompt is scored before tracking starts, so you do not waste runs on low-value queries. Every re-run compares apples to apples — same prompt, same session type, same sort order — so month-over-month movement is interpretable.

Common mistakes

Five patterns that waste prompt-research budgets.

Tracking vanity head terms instead of long-tail prompts. “best CRM” feels important. It is also 2-3 words, which looks nothing like the decision-seeking, first-person prompts that make up the bulk of real ChatGPT usage — OpenAI's own analysis classifies 49% of messages as “Asking” a question to make a decision (Chatterji et al., OpenAI/Duke/Harvard, September 2025). Track head terms for directional context, but optimize against 10-20-word phrasings that match how customers talk. Recovery: filter your matrix for any prompt under 8 words and demote them unless they are intent-rich.

Treating logged-out data as logged-in reality. Every scraped prompt-tracking tool queries logged-out or API sessions. But ChatGPT web search only triggers on 31% of prompts on average, and that rate varies from 18% (credit-card queries) to 59% (local-intent queries) (SearchEngineLand, October 2025). What your tool measures and what your logged-in customer sees can diverge significantly, especially in categories where web search rarely fires. Recovery: supplement automated tracking with manual logged-in spot checks for your top 20 prompts.

Using marketer-estimated prompts and calling it research. Workshops where the team brainstorms “what would our customer ask ChatGPT” produce short, keyword-shaped prompts. Real customers ask longer, first-person, decision-seeking questions — OpenAI found the “Seeking Information” share of ChatGPT usage grew from 14% to 24% of all messages in a single year (Chatterji et al., September 2025). The brainstormed list and the real prompt list overlap by roughly zero. Recovery: throw out your brainstormed list and replace it with 30 customer-interview prompts and 50 Reddit/GSC prompts before running any tracking.

Not versioning prompts over time. AI systems update monthly. Retrieval patterns, source mix, and internal sub-query behavior all shift. A prompt that retrieved 8 sources in January may retrieve 14 in May, and the brand mix changes. Recovery: re-run your top 20 prompts monthly and tag each run with date + platform version so you can see drift.

Ignoring platform differences. ChatGPT, Claude, Perplexity, and Google AI Mode have different retrieval, different citation patterns, and different source preferences. A 30-million-citation cross-platform analysis showed ChatGPT over-indexes on Wikipedia, Reddit, and editorial sites like Forbes, while Perplexity leans on Reddit, LinkedIn, and G2 for B2B queries (SearchEngineLand, March 2026). A prompt optimized into ChatGPT visibility may do nothing for Perplexity visibility. Recovery: split your matrix into platform-specific views and optimize per-platform, not per-prompt.

Forgetting the Fan-Out Query layer. Fan-Out Queries are the sub-queries an AI model generates internally when processing a user prompt to search the web from multiple angles. ChatGPT generates 2.3-2.8 fan-out queries per user prompt, each averaging 12 words. If you optimize only for the user-facing prompt, you miss the 2-3 internal queries the AI actually searches with. Recovery: after tracking each user prompt, inspect the citations and back-derive the likely fan-out queries, then add those as sub-prompts to your matrix.

From prompts to content strategy

Every prompt category maps to a specific content asset. Prompt research stops being theoretical when you translate it into pages.

Research prompts (“what is X”, “how does X work”): definition pages, topic hub pages, glossary entries. Optimize for extraction: definition-first paragraph, H2 as questions, named mechanism steps.
Comparison prompts (“X vs Y”, “alternatives to X”): comparison pages, vs-pages, alternatives pages. Optimize with a comparison table at the top, a “choose X if / choose Y if” block, verdict per parameter.
Recommendation prompts (“best X for Y”): category best-of pages, decision-guide pages. Optimize with labeled picks (“Best for [audience]”), prices, named downsides.
Evaluation prompts (“is X worth it”, “X review”): feature pages, case studies, pricing transparency, review-platform presence (G2, Capterra, Trustpilot).
Purchase prompts (“X pricing”, “get started with X”): pricing page, onboarding page, docs. Optimize for clarity: pricing must appear on the pricing page as text, not as an image or a “contact sales” form.

If your prompt matrix has 40 prompts in research intent and you have 2 definition pages, you have your content brief for the next quarter.

Quick-start checklist

Ten actions for week 1.

Schedule 5 customer interviews this week. Ask for verbatim prompt phrasings.
Export GSC queries, filter to 10+ words with 5+ impressions. Save as CSV.
Pick 3 subreddits. Pull the top 50 question-shaped thread titles per subreddit.
Export the last 500 support tickets. Cluster into 15 themes.
List 5 direct competitors.
Build the prompt matrix spreadsheet with the columns above.
Add 100 prompts from the sources above — aim for 40 research, 20 comparison, 20 recommendation, 10 evaluation, 10 purchase.
Run your top 20 prompts 10 times each in ChatGPT (logged-out session). Record visibility + recommendation separately.
Score every prompt on volume × commercial × gap. Sort descending.
Pick the top 5 prompts. Write or update one page per prompt this month.

After week 4, re-run the top 20 prompts. Tag the second run with date. You now have month-over-month delta tracking on prompts your customers actually use.

Next steps

Prompt research is the input to every other AEO decision. Without it, you are optimizing for imagined queries and measuring against fictional benchmarks.

Related reading:

How to find the AI prompts your customers actually use