Far & Wide's honest take
Neither Anthropic (Claude) nor OpenAI (ChatGPT) has publicly confirmed that their AI models use llms.txt for retrieval or ranking. Independent site-log observations so far show AI Crawlers fetching the file rarely, if at all. That said — setup takes about 15 minutes, costs nothing, and has zero downside. The asymmetric bet: if adoption lands, you're ready. If it doesn't, you lost an afternoon. That's why we still recommend it.
This guide covers the configuration path for WordPress, Shopify, Next.js, and static sites, plus the robots.txt changes for AI Crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), testing methods, server-log monitoring, and the anti-patterns that silently block AI access.
What llms.txt is and why it matters
llms.txt is a Markdown file at /llms.txt that gives AI models a curated, machine-readable map of your website. Proposed by Jeremy Howard (Answer.AI) in September 2024, it mirrors the role robots.txt plays for search crawlers: one file, at the root, declaring intent. The difference is what's declared. robots.txt restricts access, llms.txt offers a shortlist of your best content written in clean Markdown with descriptions.
The file matters because AI models operate under a token budget. A large language model processing a web page cannot ingest your full site architecture, CSS, JavaScript, and navigation chrome. A well-structured llms.txt compresses your site into a readable index: the model sees your value proposition, product lines, documentation, and canonical resources in one pass.
What llms.txt is not. It is not a ranking signal confirmed by any AI company. It is not a replacement for Schema Markup, the JSON-LD structured data that AI systems actually use to parse entity relationships. It is not a substitute for clean HTML. Adoption to date is modest: independent crawls of the Majestic Million dataset found fewer than 200 sites hosting the file in mid-2025, and a broader study of roughly 300,000 domains by SE Ranking found about 10% adoption. Documentation-heavy sites lead uptake, with several hundred developer-docs properties (Anthropic and Stripe among them) publishing the file. Common on docs sites is not the same as useful across the open web.
| File | Purpose | Who reads it | Status |
|---|---|---|---|
robots.txt | Access rules for crawlers | All bots (AI + search) | Industry standard since 1994 |
sitemap.xml | Machine-readable URL list | Search engines, some AI | Industry standard |
llms.txt | Curated Markdown index for LLMs | Proposed: AI models | Unofficial, 2024 proposal |
llms-full.txt | Full content, one file | Proposed: AI models | Optional companion |
What AI companies have publicly confirmed
No major AI company has publicly confirmed that their models use llms.txt for retrieval or ranking. The state of public commitments as of April 2026:
| AI company | Product | Official llms.txt support | Source position |
|---|---|---|---|
| OpenAI | ChatGPT, GPTBot | Not confirmed | GPTBot respects robots.txt; no public llms.txt statement |
| Anthropic | Claude, ClaudeBot | Not confirmed, but Anthropic publishes its own /llms.txt | Published llms.txt on anthropic.com as of 2025 |
| Gemini, Google-Extended | Not supported | John Mueller (June 2025): “No AI system currently uses llms.txt.” Gary Illyes (Search Central Deep Dive, July 2025): “Google doesn't support LLMs.txt and isn't planning to.” | |
| Perplexity | PerplexityBot | Not confirmed | No public position |
| Microsoft | Copilot, Bing | Not confirmed | No public position |
The contradiction worth noting: Anthropic publishes an /llms.txt file on its own domain but has not stated that ClaudeBot uses llms.txt files from other sites during retrieval. Google did the same: after Mueller and Illyes publicly dismissed the standard, Google quietly added an llms.txt file to its Search Central developer documentation in December 2025. Publishing one and consuming one are different acts. Both companies want their own docs easy to read for AI, without committing to read yours.
Why set up llms.txt anyway
The asymmetric-bet argument: 15 minutes of work, zero downside, non-trivial upside if adoption lands. The cost and benefit are wildly mismatched, and mismatched bets are the ones worth taking.
Setup is cheap. Creating /llms.txt, publishing it, and adjusting robots.txt takes under 30 minutes on any platform covered in this guide. No new tools, no subscriptions, no developer sprint.
The file forces content prioritization. Writing llms.txt makes you pick the 20 pages on your site that matter most. That exercise alone (deciding what belongs in a one-screen index of your business) surfaces the content gaps that hurt AI visibility whether or not any model ever reads the file.
Early-mover benefit if adoption happens. If OpenAI or Anthropic announces llms.txt support in 2027, sites with a clean, maintained file published since 2024 will have a longer history of curated signal. Sites scrambling to add one that week will look reactive.
The downside is zero. A malformed llms.txt does not hurt search rankings. An ignored llms.txt is just a file no one fetched. The only way to lose is to misconfigure robots.txt at the same time and accidentally block crawlers, which we cover in the anti-patterns section.
Write the llms.txt file
Use the proposed spec from llmstxt.org: H1 title, optional blockquote summary, H2 section headers, Markdown links with colon-separated descriptions. The format is intentionally readable by humans and parseable by LLMs in a single pass.
Working example for a fictional B2B SaaS called “Axoria” (project management for agencies):
# Axoria
> Axoria is a project management platform for creative agencies. We help 400+ agencies track retainers, forecast team capacity, and invoice clients without switching tools. Founded 2022, headquartered in Berlin, team of 34.
## Core product
- [Product overview](https://axoria.io/product): What Axoria does, who it's for, pricing tiers
- [Retainer tracking](https://axoria.io/features/retainers): How to track monthly retainers and burn rates
- [Capacity forecasting](https://axoria.io/features/capacity): Team utilization and hiring signals
- [Client invoicing](https://axoria.io/features/invoicing): Stripe and QuickBooks integrations
## Documentation
- [Getting started guide](https://axoria.io/docs/quickstart): Setup in 15 minutes
- [API reference](https://axoria.io/docs/api): REST API with code examples
- [Integrations list](https://axoria.io/docs/integrations): 40+ supported tools
## Customer evidence
- [Case study: Pentagram](https://axoria.io/customers/pentagram): 22% utilization increase in 6 months
- [Case study: Mother Design](https://axoria.io/customers/mother): Reduced invoicing time from 3 days to 4 hours
- [G2 reviews](https://www.g2.com/products/axoria): 4.7/5 across 320 reviews
## Company
- [About](https://axoria.io/about): Founding story, team, investors
- [Pricing](https://axoria.io/pricing): $49/$99/$249 per seat per month
- [Security](https://axoria.io/security): SOC 2 Type II, GDPR compliant
## Optional
- [Blog](https://axoria.io/blog): Weekly articles on agency operations
- [Changelog](https://axoria.io/changelog): Product updatesRules for the file:
- H1 = one line, your brand name. Not a tagline.
- Blockquote = one paragraph summary. Include what you do, who you serve, scale indicators (customer count, funding, team size), location.
- H2 = section groupings. Core product, Documentation, Customer evidence, Company, Optional.
- Each link =
[Title](URL): description. The colon-separated description is required. Keep descriptions under 100 characters. - “Optional” section at the end. Use this for content you want available but not prioritized.
Total target: 15–30 links maximum. More than 30 and the file becomes a sitemap, which defeats the point.
For a full-content companion file (llms-full.txt), concatenate the Markdown body of every linked page into one file. This is optional and only useful for smaller sites — a 200-page site produces a file too large for most LLM context windows to ingest usefully.
Publish llms.txt on WordPress
You have three options: a plugin, FTP upload, or a functions.php route. Plugin is fastest for non-developers; functions.php is cleanest if you want to generate the file dynamically from pages and posts.
Option 1: Yoast SEO plugin (v24.8+). Yoast added native llms.txt generation in late 2024. Go to Yoast SEO → Settings → Site features → AI optimization, enable “llms.txt file”, and Yoast will generate the file from your site structure. Review the output at yourdomain.com/llms.txt before relying on it — Yoast includes all published pages by default, which is usually too many.
Option 2: Upload via FTP or hosting file manager. Create a plain-text file named llms.txt locally, fill it with the Markdown from the section above, and upload it to your WordPress install's root directory (the same folder that contains wp-config.php and index.php). Verify it loads at yourdomain.com/llms.txt. This is the simplest option but you have to remember to update the file manually when content changes.
Option 3: Dynamic generation via functions.php. Add this to your active theme's functions.php or a custom plugin:
<?php
add_action('init', function() {
add_rewrite_rule('^llms\.txt$', 'index.php?llms_txt=1', 'top');
});
add_filter('query_vars', function($vars) {
$vars[] = 'llms_txt';
return $vars;
});
add_action('template_redirect', function() {
if (get_query_var('llms_txt')) {
header('Content-Type: text/plain; charset=utf-8');
echo "# " . get_bloginfo('name') . "\n\n";
echo "> " . get_bloginfo('description') . "\n\n";
echo "## Core pages\n\n";
$pages = get_pages(['sort_column' => 'menu_order']);
foreach ($pages as $page) {
$excerpt = wp_trim_words(strip_tags($page->post_content), 15);
echo "- [" . $page->post_title . "](" . get_permalink($page->ID) . "): " . $excerpt . "\n";
}
exit;
}
});After saving, visit Settings → Permalinks and click Save (no changes needed) to flush rewrite rules. The file now generates live at yourdomain.com/llms.txt.
Which to pick. Use Yoast if you already have it installed. Use FTP if you want a hand-curated 20-link file (recommended). Use functions.php only if your site structure changes weekly and you want automatic updates.
Publish llms.txt on Shopify
Shopify does not allow root-level file uploads through the admin UI, so you need one of three workarounds: a dedicated app, a custom template with redirect, or a subdomain. Shopify's theme system blocks direct file creation at yourstore.com/llms.txt.
Option 1: Use a Shopify app. Search “llms.txt” in the Shopify App Store. Several apps (for example, “AI SEO llms.txt Generator” and similar tools released in 2025) generate and serve the file for you with monthly pricing around $5–15. Fastest setup, ongoing cost.
Option 2: Custom template + redirect. Create a new page in Shopify admin called “llms” with handle llms. Create a custom template page.llms.liquid in your theme that outputs {{ page.content }} wrapped in {% layout none %} with Content-Type override. The URL will be yourstore.com/pages/llms, not yourstore.com/llms.txt. To fix the URL, add a redirect in Online Store → Navigation → URL Redirects: from /llms.txt to /pages/llms. AI Crawlers following the 301 will land on the correct content, but strict parsers expecting a direct file at /llms.txt may not follow the redirect.
Option 3: Host on a subdomain. Point llms.yourstore.com at a static host (Netlify, Vercel, GitHub Pages) and serve the file there. This breaks the convention of serving llms.txt from the root domain, so use it only as a last resort.
Which to pick. App for convenience. Custom template + redirect if you want to avoid recurring fees and accept the redirect caveat.
Publish llms.txt on Next.js
On App Router, create app/llms.txt/route.ts and return plain text. On Pages Router, use pages/api/llms.txt.ts. For static exports, put the file in public/llms.txt. All three serve the file at yourdomain.com/llms.txt.
App Router (Next.js 13+):
// app/llms.txt/route.ts
import { NextResponse } from 'next/server';
export async function GET() {
const content = `# Axoria
> Axoria is a project management platform for creative agencies. We help 400+ agencies track retainers, forecast team capacity, and invoice clients without switching tools.
## Core product
- [Product overview](https://axoria.io/product): What Axoria does, who it's for, pricing tiers
- [Retainer tracking](https://axoria.io/features/retainers): How to track monthly retainers and burn rates
## Documentation
- [Getting started](https://axoria.io/docs/quickstart): Setup in 15 minutes
- [API reference](https://axoria.io/docs/api): REST API with code examples
## Company
- [About](https://axoria.io/about): Founding story, team, investors
- [Pricing](https://axoria.io/pricing): $49/$99/$249 per seat per month
`;
return new NextResponse(content, {
status: 200,
headers: {
'Content-Type': 'text/plain; charset=utf-8',
'Cache-Control': 'public, max-age=3600',
},
});
}Pages Router (Next.js 12 and earlier):
// pages/api/llms.txt.ts
import type { NextApiRequest, NextApiResponse } from 'next';
export default function handler(req: NextApiRequest, res: NextApiResponse) {
const content = `# Axoria\n\n> ...`; // same content as above
res.setHeader('Content-Type', 'text/plain; charset=utf-8');
res.setHeader('Cache-Control', 'public, max-age=3600');
res.status(200).send(content);
}Add a rewrite in next.config.js so it serves at /llms.txt:
module.exports = {
async rewrites() {
return [
{ source: '/llms.txt', destination: '/api/llms.txt' },
];
},
};Static export (output: 'export'). Put a pre-generated llms.txt file in the public/ directory. Next.js copies it to the build output unchanged. Update manually when content changes.
Dynamic generation from MDX. If your site uses MDX for docs, generate llms.txt at build time by reading the frontmatter of every MDX file and writing links to public/llms.txt in a prebuild script. This keeps the file in sync with your content, no runtime route needed.
Publish llms.txt on static sites
On Netlify, Vercel, Cloudflare Pages, GitHub Pages, or any static host, drop llms.txt into the root of your publish directory. The file is served as-is at yourdomain.com/llms.txt.
Netlify. Place llms.txt at the root of your publish directory (usually public/, dist/, or the repo root depending on framework). Commit and push. Netlify serves it automatically. To force the correct Content-Type, add this to netlify.toml:
[[headers]]
for = "/llms.txt"
[headers.values]
Content-Type = "text/plain; charset=utf-8"
Cache-Control = "public, max-age=3600"Vercel. Place llms.txt in the public/ directory. Vercel serves static files from public/ at the domain root. Add headers in vercel.json:
{
"headers": [
{
"source": "/llms.txt",
"headers": [
{ "key": "Content-Type", "value": "text/plain; charset=utf-8" },
{ "key": "Cache-Control", "value": "public, max-age=3600" }
]
}
]
}Cloudflare Pages. Same as Vercel — put the file in the repo root or publish directory, configure headers via _headers file:
/llms.txt
Content-Type: text/plain; charset=utf-8
Cache-Control: public, max-age=3600GitHub Pages. Commit llms.txt to the root of the branch serving Pages (usually main or gh-pages). GitHub Pages serves it directly with Content-Type: text/plain. No configuration needed.
Plain HTML site on a shared host. Upload llms.txt to the same directory as index.html via FTP or cPanel File Manager. Done.
Configure robots.txt for AI Crawlers
Your robots.txt file must explicitly allow AI Crawlers or they will not fetch your content, including your llms.txt. AI Crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) are bots that AI companies use to retrieve web content for their models' real-time web search capabilities. If robots.txt blocks these bots, AI platforms cannot retrieve your content during web search. No amount of content optimization helps.
Working robots.txt for full AI access:
# Search engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# OpenAI
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: OAI-SearchBot
Allow: /
# Anthropic
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: Claude-Web
Allow: /
# Perplexity
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
# Google AI
User-agent: Google-Extended
Allow: /
# Common Crawl (trains multiple LLMs)
User-agent: CCBot
Allow: /
# Apple
User-agent: Applebot-Extended
Allow: /
# Meta
User-agent: FacebookBot
Allow: /
User-agent: meta-externalagent
Allow: /
# ByteDance
User-agent: Bytespider
Allow: /
# Default
User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap.xmlWhat each crawler does:
| Crawler | Owner | Purpose |
|---|---|---|
GPTBot | OpenAI | Training data collection |
ChatGPT-User | OpenAI | Real-time retrieval when a ChatGPT user asks |
OAI-SearchBot | OpenAI | Indexing for ChatGPT search |
ClaudeBot | Anthropic | Training + retrieval |
anthropic-ai | Anthropic | Legacy crawler name, still used |
Claude-Web | Anthropic | Browsing tool for Claude |
PerplexityBot | Perplexity | Retrieval for Perplexity answers |
Perplexity-User | Perplexity | Real-time browsing on user query |
Google-Extended | Controls Gemini training use | |
CCBot | Common Crawl | Training data for many LLMs |
Applebot-Extended | Apple | Apple Intelligence training |
meta-externalagent | Meta | Llama training |
Bytespider | ByteDance | Doubao/Cici training |
Blocking selectively. If you want to allow retrieval but block training, allow ChatGPT-User and Perplexity-User (real-time) while blocking GPTBot and CCBot (training). Trade-off: this reduces parametric knowledge (the information AI models recall without searching the web), so your brand becomes invisible to Layer 1 queries (users asking ChatGPT without web search enabled).
Test whether AI Crawlers can reach your content
Use three tests: curl with a spoofed User-Agent, a robots.txt validator, and a live ChatGPT browsing query. Each one catches a different category of failure.
Test 1: curl with AI Crawler User-Agent. Simulate each bot and confirm it receives a 200 response with your actual content:
# GPTBot
curl -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)" \
-I https://yourdomain.com/llms.txt
# ClaudeBot
curl -A "Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" \
-I https://yourdomain.com/llms.txt
# PerplexityBot
curl -A "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)" \
-I https://yourdomain.com/llms.txtExpected output: HTTP/2 200 and content-type: text/plain. If you see 403 Forbidden, 503, or a challenge page, your WAF or CDN is blocking the crawler.
Repeat the same three commands against https://yourdomain.com/ (your homepage) to confirm the crawler can reach actual content, not just the llms.txt file.
Test 2: robots.txt validators. Paste your robots.txt into:
- Google Search Console → robots.txt Tester (validates against Googlebot and Google-Extended)
- TametheBot's robots.txt validator (validates against AI-specific User-Agents)
- Merkle's robots.txt tester
Confirm each AI bot User-Agent you care about returns “Allowed”.
Test 3: Ask ChatGPT with browsing. Open ChatGPT, enable web search, and ask: “Visit [yourdomain.com] and summarize what the company does.” If ChatGPT returns an accurate summary with citations from your pages, retrieval is working. If ChatGPT says “I cannot access that site” or returns content from third-party sources only (your LinkedIn, review sites), retrieval is blocked or your content is too thin for extraction.
Repeat in Perplexity and Claude with browsing enabled. Cross-engine validation is the only way to catch platform-specific blocks.
Monitor AI Crawler activity in server logs
Parse your access logs for AI Crawler User-Agents weekly. Track four metrics: visit count per bot, 2xx vs 4xx/5xx rate, paths fetched, time-to-first-crawl for new content. Server logs are the ground truth. Analytics platforms filter bots by default.
User-Agent strings to grep:
GPTBot
ChatGPT-User
OAI-SearchBot
ClaudeBot
anthropic-ai
Claude-Web
PerplexityBot
Perplexity-User
Google-Extended
CCBot
Applebot-Extended
Bytespider
meta-externalagentTools:
- Cloudflare Analytics → Bot Management. If your site is behind Cloudflare, the dashboard shows AI Crawler traffic broken out by bot. Cloudflare also flags “verified bots” vs “unverified” — unverified bots with AI User-Agents are usually scrapers pretending to be official crawlers.
- Screaming Frog Log File Analyzer. Desktop app. Import raw access logs (Apache, Nginx, AWS CloudFront). Filter by User-Agent. Good for one-time audits.
- GoAccess. Open-source, terminal-based log analyzer. Run
goaccess access.log --log-format=COMBINEDand filter the User-Agent panel. Free, fast, works on any Linux/macOS server. - Elasticsearch + Kibana. For enterprise setups. Ingest logs, build a dashboard filtering on User-Agent patterns. Overkill for most businesses.
What to track in your weekly review:
| Metric | Good signal | Bad signal |
|---|---|---|
| GPTBot visits/week | 50–500 on a small site, 1,000+ on a publication | 0 (blocked) or 10,000+ (crawl budget waste) |
| 2xx rate | ≥ 95% | < 90% (firewall or WAF interference) |
| Paths fetched | Homepage, pricing, product pages, blog posts | Only homepage (thin architecture) |
| New content crawl time | ≤ 48 hours from publish to first crawl | > 7 days (sitemap or internal linking issue) |
Red flags to act on: a sudden drop in GPTBot visits usually means your host rotated IPs and Cloudflare's bot management flagged the new range. A 100% 403 rate for one specific crawler means your WAF added a rule. A crawler fetching only /wp-admin/ or /api/ and receiving 403s is a scraper pretending to be an AI bot. Fine to keep blocking.
Common mistakes that make your site invisible to AI
These are the five anti-patterns we see most often in Far & Wide audits. Each one silently blocks AI Crawlers while looking fine to humans visiting the site.
1. Cloudflare's “Block AI Scrapers and Crawlers” toggle enabled by default. In 2024, Cloudflare added a one-click setting under Security → Bots that blocks known AI Crawlers. The toggle was enabled by default on new accounts for a period in 2024–2025. Check it at Cloudflare Dashboard → Security → Bots → “Block AI bots” and turn it off if you want AI retrieval. Many sites discover they've been invisible to ChatGPT for months because of this single toggle.
2. Security plugins that block GPTBot by default. WordPress plugins like Wordfence, All In One WP Security, and iThemes Security ship with optional “block AI bots” rules. Check each plugin's firewall rules and whitelist GPTBot, ClaudeBot, PerplexityBot, and Google-Extended if you find them blocked.
3. Stale or missing sitemap. AI Crawlers use sitemap.xml for discovery. If your sitemap lists URLs that return 404, if it excludes pages added in the last month, or if your robots.txt doesn't reference the sitemap, crawlers miss content. Regenerate the sitemap on every deploy and verify the Sitemap: line in robots.txt matches the actual URL.
4. Cloaking or user-agent-based content switching. Serving different HTML to bots than to humans (“cloaking”) gets sites penalized by Google and is likely to be flagged by AI Crawlers over time. If your site returns a thin HTML shell to crawlers and lazy-loads content via JavaScript for humans, AI Crawlers see the shell. Use server-side rendering or static generation for any content you want AI to read.
5. Marketing-copy llms.txt. A llms.txt full of “industry-leading solutions” and “advanced innovation” is worse than no file. The point of llms.txt is to give AI models clean, factual, specific content. Examples of bad entries: - [About us](/about): We are the world's best provider of innovative solutions. Examples of useful entries: - [About us](/about): Axoria, 34-person team, founded 2022, Berlin, €8M Series A from Point Nine.
6. Missing Schema Markup. llms.txt is a file for LLMs to read. Schema Markup (structured data in JSON-LD format) is machine-readable code that tells AI systems your brand name, description, industry, founding date, and relationships. The two are complementary. A site with llms.txt but no Schema Markup is using one signal and skipping the more established one. See our schema markup for AEO guide.
7. Forgotten noindex headers on key pages. If your pricing page returns X-Robots-Tag: noindex (sometimes set at the CDN level for staging and forgotten on production), AI Crawlers may still fetch it, but will not include it in retrieval. Check every page in your llms.txt with curl -I and look for X-Robots-Tag in the response.
What the future of llms.txt looks like
Three plausible scenarios, each with different implications for how much effort to invest now.
Scenario 1: Official adoption (probability: low-to-medium). OpenAI or Anthropic announces formal llms.txt support in 2026–2027. Sites with a maintained file benefit. This is the scenario where the asymmetric bet pays off. Evidence in favor: the ease of implementation, visible uptake on documentation sites, and the fact that the major AI companies themselves publish llms.txt on their own dev docs. Evidence against: Google's public dismissal in 2025, and the broader pattern of AI companies relying on Retrieval-Augmented Generation from the open web rather than publisher-curated feeds.
Scenario 2: Partial retrieval (probability: medium-high). AI companies quietly use llms.txt as a secondary signal without announcing it. It influences which pages get priority in retrieval without being a public ranking factor. This matches how AI systems already treat Schema Markup: indirect effect through better content structure, not direct ranking. In this scenario, having a file helps at the margin and a low observed fetch rate is consistent with marginal use.
Scenario 3: Quiet fade (probability: medium). llms.txt becomes the .well-known/ai.txt of its era: a proposed standard that never reached critical mass. AI companies standardize on something else (a schema.org extension, an HTTP header, an expanded sitemap format). Sites that set up llms.txt lose 30 minutes. Sites that didn't also lose nothing.
What we recommend. Set up llms.txt once, keep it current when you add major pages (quarterly review), and don't invest in llms-full.txt or automated regeneration until an AI company publicly confirms they use the file. Spend the time saved on schema markup, brand entity optimization, and content structure for AI citation. All three have direct evidence of impact on AI visibility today.
Quick-start checklist
- Write llms.txt using the B2B SaaS example as a template (15 links minimum, 30 maximum)
- Publish at
yourdomain.com/llms.txtusing the platform-specific method above - Verify
Content-Type: text/plain; charset=utf-8in response headers - Update
robots.txtto explicitly allow GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, CCBot - Add
Sitemap:line torobots.txtpointing to a valid sitemap - Check Cloudflare → Security → Bots and disable “Block AI bots” if enabled
- Test with
curl -A "GPTBot/1.2" -I https://yourdomain.com/llms.txt— expect 200 - Test with live ChatGPT browsing query on your domain
- Set up weekly log review for AI Crawler User-Agents (Cloudflare, GoAccess, or Screaming Frog)
- Quarterly: update llms.txt when major pages are added or removed
Next steps
llms.txt is one file in a longer list of technical and content moves that determine whether AI systems cite your brand. The other moves (Schema Markup, entity consistency across third-party sources, content structured for passage extraction, and monitoring whether AI recommends you for your actual customer queries) produce most of the results we see in client audits.
If you want the full picture for your site, Far & Wide's AEO Enterprise Audit (from €750, one-time) covers all three layers. You get:
- A full 10-step technical audit across up to 50 pages (including llms.txt, robots.txt, Schema Markup, JS rendering, sitemap, and headers)
- AI visibility testing across ChatGPT, Claude, and Perplexity in three scenarios (parametric knowledge, clean web search, and customer-profile web search)
- Per-product analysis with competitive share of voice
- A 1.5-hour strategy call with the full roadmap
- 15+ deliverable documents
For founders and marketers who want a faster snapshot first, the AI Visibility Report (€80 limited offer, regular €100) tests 10 real customer questions on ChatGPT in 2 scenarios and returns a PDF in ~20 minutes.
Further reading on related topics:
- Schema Markup for AEO: the JSON-LD types that AI systems actually use
- How to run an AEO audit: the three-layer framework we use on every client
- Brand entity optimization for AI: entity consistency across Wikipedia, Crunchbase, LinkedIn
- Best AEO tools: when prompt tracking tools help and when they mislead
- AI content optimization for answer engines: passage-level structure for citation