AI Crawler Checker
Most "AI crawler" tools only parse your robots.txt. We actually fetch your site as each bot and tell you what they really see. Catches Cloudflare and WAF blocks that other tools miss.
§ what this tool checks
Rules applied to every scan.
In 2024, Cloudflare changed defaults to block AI bots at the edge. Sites with permissive robots.txt files are still invisible to ChatGPT and Claude. Robots.txt parsers can't catch this. Live-fetch testing can. This tool fetches your page as 10 different AI crawlers and reports what they actually see.
GPTBot, ChatGPT-User, OAI-SearchBot (OpenAI)
ClaudeBot, Claude-User, anthropic-ai (Anthropic)
PerplexityBot, Perplexity-User (Perplexity)
Google-Extended (Gemini, AI Overviews)
Applebot-Extended (Apple Intelligence)
Silent block detection: robots allows but live fetch returns 403/429
JS-rendering heuristic: 200 OK with empty body text
robots.txt parse with per-bot rules and llms.txt check
§ faq
Questions, answered.
Why do robots.txt parsers miss most AI bot blocks today?
In 2024 Cloudflare changed its WAF defaults to block AI bots at the edge. Your robots.txt might say "User-agent: GPTBot — Allow: /", but Cloudflare returns 403 before the request ever reaches your origin. Parsers only read the file; they never test what the bot actually sees. This tool fetches your page with each bot's User-Agent and reports the real HTTP response.
What is the difference between GPTBot and ChatGPT-User?
GPTBot is OpenAI's training crawler — it indexes your content for future model training. Opting out is a legitimate choice. ChatGPT-User is the live browsing agent that fetches pages when a ChatGPT user asks a question. Blocking ChatGPT-User removes you from real-time AI answers — almost always unintentional. Same distinction applies to Claude/Claude-User and Perplexity/Perplexity-User.
Is opting out of Google-Extended the same as blocking Googlebot?
No. Google-Extended is a separate token that controls whether Google can use your content for AI training (Gemini) and AI Overviews. Disallowing Google-Extended has no effect on regular Googlebot or your search rankings — but it does remove you from Google's AI surfaces. Confirm this is intentional before adding it to robots.txt.
My page returned 200 but the tool says "empty content" — what happened?
Your page rendered HTML successfully but contained less than 200 characters of readable text. This usually means the page is hydrated client-side (React, Vue, etc.). AI crawlers do not execute JavaScript — they see the initial HTML payload only. If that's a shell with no content, AI tools cannot read your page. Server-render the primary content or pre-render for known bot User-Agents.
§ run the full audit
Stop guessing. Scan everything in one click.
60 automated checks across meta tags, robots.txt, Open Graph, sitemaps, headings, AI visibility, and more — free, no signup.
run a full scan →§ other tools
Check another thing.
Single-purpose inspectors for when you need to verify one thing.
01Meta Tag CheckerCheck your page title, meta description, viewport, charset, and robots tags.02Robots.txt ValidatorValidate your robots.txt file for syntax errors and blocking rules.03Open Graph Tag PreviewCheck your Open Graph and Twitter Card tags for social media sharing.04Sitemap ValidatorValidate your sitemap.xml file format and URL count.05Heading Structure CheckerAnalyze your H1-H6 heading hierarchy for SEO best practices.06SSL Certificate CheckerCheck your SSL certificate: expiry date, issuer, and trust chain.07Redirect CheckerTrace the full redirect chain for any URL — every 301, 302, and hop.08Structured Data ValidatorValidate your JSON-LD schema markup for missing fields and errors.09Broken Link CheckerScan a page for internal links that return 404s and server errors.10Core Web Vitals CheckerCheck the lab signals behind LCP, CLS, and a fast first byte.11Security Headers CheckerScan your HTTP response headers for CSP, HSTS, and other protections.12Canonical Tag CheckerCheck your canonical tag and confirm it points where it should.