Question 1

Why do robots.txt parsers miss most AI bot blocks today?

Accepted Answer

In 2024 Cloudflare changed its WAF defaults to block AI bots at the edge. Your robots.txt might say "User-agent: GPTBot — Allow: /", but Cloudflare returns 403 before the request ever reaches your origin. Parsers only read the file; they never test what the bot actually sees. This tool fetches your page with each bot's User-Agent and reports the real HTTP response.

Question 2

What is the difference between GPTBot and ChatGPT-User?

Accepted Answer

GPTBot is OpenAI's training crawler — it indexes your content for future model training. Opting out is a legitimate choice. ChatGPT-User is the live browsing agent that fetches pages when a ChatGPT user asks a question. Blocking ChatGPT-User removes you from real-time AI answers — almost always unintentional. Same distinction applies to Claude/Claude-User and Perplexity/Perplexity-User.

Question 3

Is opting out of Google-Extended the same as blocking Googlebot?

Accepted Answer

No. Google-Extended is a separate token that controls whether Google can use your content for AI training (Gemini) and AI Overviews. Disallowing Google-Extended has no effect on regular Googlebot or your search rankings — but it does remove you from Google's AI surfaces. Confirm this is intentional before adding it to robots.txt.

Question 4

My page returned 200 but the tool says "empty content" — what happened?

Accepted Answer

Your page rendered HTML successfully but contained less than 200 characters of readable text. This usually means the page is hydrated client-side (React, Vue, etc.). AI crawlers do not execute JavaScript — they see the initial HTML payload only. If that's a shell with no content, AI tools cannot read your page. Server-render the primary content or pre-render for known bot User-Agents.

AI Crawler Checker

Rules applied to every scan.

Questions, answered.

Stop guessing. Scan everything in one click.

Check another thing.