Deploy first, test later is not a strategy

Your robots.txt is two lines of text that can make your entire site invisible to Google. Yet most teams treat it like a config file that doesn't need testing - write it once, deploy it, and forget it.

The problem is that robots.txt mistakes are silent. There's no build error, no console warning, no failing test. You only find out something's wrong when your traffic drops weeks later. Here's how to catch those mistakes before they reach production.

Test locally before deploying

Serve your robots.txt locally

If you're using a static robots.txt file in your public/ directory, you can inspect it directly. But if you're generating it dynamically (like with Next.js robots.ts), you need to actually serve it:

# Start your dev server
pnpm dev

# Fetch the generated robots.txt
curl http://localhost:3000/robots.txt

Compare the output against what you expect. The most critical thing to verify: your production robots.txt does NOT contain Disallow: /.

Check environment-specific logic

Many frameworks generate different robots.txt files for staging and production. This is the #1 source of robots.txt disasters - the staging config leaks into production. If your robots.txt is dynamic, test both environments:

// Common pattern in Next.js robots.ts
import type { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  const isProduction = process.env.NODE_ENV === 'production';

  return {
    rules: [
      {
        userAgent: '*',
        allow: '/',
        // This is where mistakes happen
        disallow: isProduction ? ['/api/', '/auth/'] : ['/'],
      },
    ],
    sitemap: `${process.env.NEXT_PUBLIC_SITE_URL}/sitemap.xml`,
  };
}

Test with NODE_ENV=production to verify the production output is correct.

Common syntax pitfalls

Robots.txt syntax is deceptively simple, but small mistakes have big consequences.

Typos in directives

Disallow has one "s." Write Dissallow and the rule is silently ignored - crawlers treat unrecognized directives as comments. Same goes for User-Agent vs User-agent (case matters for some crawlers).

Missing trailing slashes

Disallow: /api matches /api, /api/scan, and also /api-docs. If you only meant to block the API directory, use Disallow: /api/ with a trailing slash.

# Blocks /api AND /api-docs (probably not what you want)
Disallow: /api

# Blocks only /api/ and its children
Disallow: /api/

Wildcard gotchas

Googlebot supports * wildcards, but not all crawlers do. And wildcards can be broader than you expect:

# This blocks any URL containing "admin" anywhere
Disallow: /*admin*

# This is probably what you meant
Disallow: /admin/

Conflicting rules

When Allow and Disallow conflict, the more specific rule wins. But "more specific" means the longer path, which isn't always intuitive:

User-agent: *
Disallow: /docs/
Allow: /docs/public/

# /docs/public/guide.html → ALLOWED (more specific rule wins)
# /docs/internal/spec.html → BLOCKED

Test with Google Search Console

Google provides a robots.txt tester in Search Console under Settings > robots.txt. It shows you:

Whether your robots.txt is accessible
Any syntax warnings
A URL tester to check if specific URLs are blocked or allowed

This is the authoritative test because it uses Google's actual robots.txt parser. If Search Console says a URL is blocked, that's what Googlebot will do.

The downside: it only works for sites you've verified in Search Console, and it only tests against the live production file. It can't test a file before you deploy it.

Validate programmatically

For CI/CD pipelines, you can validate robots.txt as part of your build process. A basic check:

# Build your site
pnpm build

# Check that robots.txt exists and doesn't block everything
ROBOTS=$(curl -s http://localhost:3000/robots.txt)

if echo "$ROBOTS" | grep -q "Disallow: /$"; then
  echo "ERROR: robots.txt blocks all crawling"
  exit 1
fi

if ! echo "$ROBOTS" | grep -qi "sitemap:"; then
  echo "WARNING: robots.txt missing Sitemap directive"
fi

The five-point robots.txt checklist

Before every deploy, verify:

No Disallow: / - Unless you're intentionally blocking all crawling (staging environments only)
CSS and JS are accessible - Don't block /_next/, /static/, or /assets/
Sitemap directive is present - Sitemap: https://yourdomain.com/sitemap.xml
Paths use trailing slashes - /api/ not /api
Environment logic is correct - Production config doesn't inherit staging rules

The fastest way to check

If you want to skip the manual testing and validate your robots.txt in seconds, paste your URL into the LintPage Robots.txt Validator. It catches syntax errors, overly broad blocks, missing sitemaps, and conflicting rules automatically.

§ try this tool

Robots.txt Validator

Validate your robots.txt file for syntax errors and blocking rules.

try it free →

Deploy first, test later is not a strategy

Test locally before deploying

Serve your robots.txt locally

# Start your dev server
pnpm dev

# Fetch the generated robots.txt
curl http://localhost:3000/robots.txt

Compare the output against what you expect. The most critical thing to verify: your production robots.txt does NOT contain Disallow: /.

Check environment-specific logic

// Common pattern in Next.js robots.ts
import type { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  const isProduction = process.env.NODE_ENV === 'production';

  return {
    rules: [
      {
        userAgent: '*',
        allow: '/',
        // This is where mistakes happen
        disallow: isProduction ? ['/api/', '/auth/'] : ['/'],
      },
    ],
    sitemap: `${process.env.NEXT_PUBLIC_SITE_URL}/sitemap.xml`,
  };
}

Test with NODE_ENV=production to verify the production output is correct.

Common syntax pitfalls

Robots.txt syntax is deceptively simple, but small mistakes have big consequences.

Typos in directives

Missing trailing slashes

Disallow: /api matches /api, /api/scan, and also /api-docs. If you only meant to block the API directory, use Disallow: /api/ with a trailing slash.

# Blocks /api AND /api-docs (probably not what you want)
Disallow: /api

# Blocks only /api/ and its children
Disallow: /api/

Wildcard gotchas

Googlebot supports * wildcards, but not all crawlers do. And wildcards can be broader than you expect:

# This blocks any URL containing "admin" anywhere
Disallow: /*admin*

# This is probably what you meant
Disallow: /admin/

Conflicting rules

When Allow and Disallow conflict, the more specific rule wins. But "more specific" means the longer path, which isn't always intuitive:

User-agent: *
Disallow: /docs/
Allow: /docs/public/

# /docs/public/guide.html → ALLOWED (more specific rule wins)
# /docs/internal/spec.html → BLOCKED

Test with Google Search Console

Google provides a robots.txt tester in Search Console under Settings > robots.txt. It shows you:

Whether your robots.txt is accessible
Any syntax warnings
A URL tester to check if specific URLs are blocked or allowed

This is the authoritative test because it uses Google's actual robots.txt parser. If Search Console says a URL is blocked, that's what Googlebot will do.

The downside: it only works for sites you've verified in Search Console, and it only tests against the live production file. It can't test a file before you deploy it.

Validate programmatically

For CI/CD pipelines, you can validate robots.txt as part of your build process. A basic check:

# Build your site
pnpm build

# Check that robots.txt exists and doesn't block everything
ROBOTS=$(curl -s http://localhost:3000/robots.txt)

if echo "$ROBOTS" | grep -q "Disallow: /$"; then
  echo "ERROR: robots.txt blocks all crawling"
  exit 1
fi

if ! echo "$ROBOTS" | grep -qi "sitemap:"; then
  echo "WARNING: robots.txt missing Sitemap directive"
fi

The five-point robots.txt checklist

Before every deploy, verify:

No Disallow: / - Unless you're intentionally blocking all crawling (staging environments only)
CSS and JS are accessible - Don't block /_next/, /static/, or /assets/
Sitemap directive is present - Sitemap: https://yourdomain.com/sitemap.xml
Paths use trailing slashes - /api/ not /api
Environment logic is correct - Production config doesn't inherit staging rules

The fastest way to check

§ try this tool

Robots.txt Validator

Validate your robots.txt file for syntax errors and blocking rules.

try it free →

robots.txt Testing Guide: How to Test Before You Deploy

Deploy first, test later is not a strategy

Test locally before deploying

Serve your robots.txt locally

Check environment-specific logic

Common syntax pitfalls

Typos in directives

Missing trailing slashes

Wildcard gotchas

Conflicting rules

Test with Google Search Console

Validate programmatically

The five-point robots.txt checklist

The fastest way to check

Get notified when we publish new posts.

Want the full picture? Stop checking one thing at a time.

robots.txt Testing Guide: How to Test Before You Deploy

Deploy first, test later is not a strategy

Test locally before deploying

Serve your robots.txt locally

Check environment-specific logic

Common syntax pitfalls

Typos in directives

Missing trailing slashes

Wildcard gotchas

Conflicting rules

Test with Google Search Console

Validate programmatically

The five-point robots.txt checklist

The fastest way to check

Get notified when we publish new posts.

Want the full picture? Stop checking one thing at a time.