Deploy first, test later is not a strategy
Your robots.txt is two lines of text that can make your entire site invisible to Google. Yet most teams treat it like a config file that doesn't need testing - write it once, deploy it, and forget it.
The problem is that robots.txt mistakes are silent. There's no build error, no console warning, no failing test. You only find out something's wrong when your traffic drops weeks later. Here's how to catch those mistakes before they reach production.
Test locally before deploying
Serve your robots.txt locally
If you're using a static robots.txt file in your public/ directory, you can inspect it directly. But if you're generating it dynamically (like with Next.js robots.ts), you need to actually serve it:
# Start your dev server
pnpm dev
# Fetch the generated robots.txt
curl http://localhost:3000/robots.txt
Compare the output against what you expect. The most critical thing to verify: your production robots.txt does NOT contain Disallow: /.
Check environment-specific logic
Many frameworks generate different robots.txt files for staging and production. This is the #1 source of robots.txt disasters - the staging config leaks into production. If your robots.txt is dynamic, test both environments:
// Common pattern in Next.js robots.ts
import type { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
const isProduction = process.env.NODE_ENV === 'production';
return {
rules: [
{
userAgent: '*',
allow: '/',
// This is where mistakes happen
disallow: isProduction ? ['/api/', '/auth/'] : ['/'],
},
],
sitemap: `${process.env.NEXT_PUBLIC_SITE_URL}/sitemap.xml`,
};
}
Test with NODE_ENV=production to verify the production output is correct.
Common syntax pitfalls
Robots.txt syntax is deceptively simple, but small mistakes have big consequences.
Typos in directives
Disallow has one "s." Write Dissallow and the rule is silently ignored - crawlers treat unrecognized directives as comments. Same goes for User-Agent vs User-agent (case matters for some crawlers).
Missing trailing slashes
Disallow: /api matches /api, /api/scan, and also /api-docs. If you only meant to block the API directory, use Disallow: /api/ with a trailing slash.
# Blocks /api AND /api-docs (probably not what you want)
Disallow: /api
# Blocks only /api/ and its children
Disallow: /api/
Wildcard gotchas
Googlebot supports * wildcards, but not all crawlers do. And wildcards can be broader than you expect:
# This blocks any URL containing "admin" anywhere
Disallow: /*admin*
# This is probably what you meant
Disallow: /admin/
Conflicting rules
When Allow and Disallow conflict, the more specific rule wins. But "more specific" means the longer path, which isn't always intuitive:
User-agent: *
Disallow: /docs/
Allow: /docs/public/
# /docs/public/guide.html → ALLOWED (more specific rule wins)
# /docs/internal/spec.html → BLOCKED
Test with Google Search Console
Google provides a robots.txt tester in Search Console under Settings > robots.txt. It shows you:
- Whether your robots.txt is accessible
- Any syntax warnings
- A URL tester to check if specific URLs are blocked or allowed
This is the authoritative test because it uses Google's actual robots.txt parser. If Search Console says a URL is blocked, that's what Googlebot will do.
The downside: it only works for sites you've verified in Search Console, and it only tests against the live production file. It can't test a file before you deploy it.
Validate programmatically
For CI/CD pipelines, you can validate robots.txt as part of your build process. A basic check:
# Build your site
pnpm build
# Check that robots.txt exists and doesn't block everything
ROBOTS=$(curl -s http://localhost:3000/robots.txt)
if echo "$ROBOTS" | grep -q "Disallow: /$"; then
echo "ERROR: robots.txt blocks all crawling"
exit 1
fi
if ! echo "$ROBOTS" | grep -qi "sitemap:"; then
echo "WARNING: robots.txt missing Sitemap directive"
fi
The five-point robots.txt checklist
Before every deploy, verify:
- No
Disallow: /- Unless you're intentionally blocking all crawling (staging environments only) - CSS and JS are accessible - Don't block
/_next/,/static/, or/assets/ - Sitemap directive is present -
Sitemap: https://yourdomain.com/sitemap.xml - Paths use trailing slashes -
/api/not/api - Environment logic is correct - Production config doesn't inherit staging rules
The fastest way to check
If you want to skip the manual testing and validate your robots.txt in seconds, paste your URL into the LintPage Robots.txt Validator. It catches syntax errors, overly broad blocks, missing sitemaps, and conflicting rules automatically.
Robots.txt Validator
Validate your robots.txt file for syntax errors and blocking rules.