lintpageincident.log
checksenvironmentstoolsblogfaq
sign inget started
~/blog/robots-txt-testing-guide
SEORobots.txtTesting

robots.txt Testing Guide: How to Test Before You Deploy

Marius Orzaru·March 27, 2026·7 min read

Deploy first, test later is not a strategy

Your robots.txt is two lines of text that can make your entire site invisible to Google. Yet most teams treat it like a config file that doesn't need testing - write it once, deploy it, and forget it.

The problem is that robots.txt mistakes are silent. There's no build error, no console warning, no failing test. You only find out something's wrong when your traffic drops weeks later. Here's how to catch those mistakes before they reach production.

Test locally before deploying

Serve your robots.txt locally

If you're using a static robots.txt file in your public/ directory, you can inspect it directly. But if you're generating it dynamically (like with Next.js robots.ts), you need to actually serve it:

# Start your dev server
pnpm dev

# Fetch the generated robots.txt
curl http://localhost:3000/robots.txt

Compare the output against what you expect. The most critical thing to verify: your production robots.txt does NOT contain Disallow: /.

Check environment-specific logic

Many frameworks generate different robots.txt files for staging and production. This is the #1 source of robots.txt disasters - the staging config leaks into production. If your robots.txt is dynamic, test both environments:

// Common pattern in Next.js robots.ts
import type { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  const isProduction = process.env.NODE_ENV === 'production';

  return {
    rules: [
      {
        userAgent: '*',
        allow: '/',
        // This is where mistakes happen
        disallow: isProduction ? ['/api/', '/auth/'] : ['/'],
      },
    ],
    sitemap: `${process.env.NEXT_PUBLIC_SITE_URL}/sitemap.xml`,
  };
}

Test with NODE_ENV=production to verify the production output is correct.

Common syntax pitfalls

Robots.txt syntax is deceptively simple, but small mistakes have big consequences.

Typos in directives

Disallow has one "s." Write Dissallow and the rule is silently ignored - crawlers treat unrecognized directives as comments. Same goes for User-Agent vs User-agent (case matters for some crawlers).

Missing trailing slashes

Disallow: /api matches /api, /api/scan, and also /api-docs. If you only meant to block the API directory, use Disallow: /api/ with a trailing slash.

# Blocks /api AND /api-docs (probably not what you want)
Disallow: /api

# Blocks only /api/ and its children
Disallow: /api/

Wildcard gotchas

Googlebot supports * wildcards, but not all crawlers do. And wildcards can be broader than you expect:

# This blocks any URL containing "admin" anywhere
Disallow: /*admin*

# This is probably what you meant
Disallow: /admin/

Conflicting rules

When Allow and Disallow conflict, the more specific rule wins. But "more specific" means the longer path, which isn't always intuitive:

User-agent: *
Disallow: /docs/
Allow: /docs/public/

# /docs/public/guide.html → ALLOWED (more specific rule wins)
# /docs/internal/spec.html → BLOCKED

Test with Google Search Console

Google provides a robots.txt tester in Search Console under Settings > robots.txt. It shows you:

  • Whether your robots.txt is accessible
  • Any syntax warnings
  • A URL tester to check if specific URLs are blocked or allowed

This is the authoritative test because it uses Google's actual robots.txt parser. If Search Console says a URL is blocked, that's what Googlebot will do.

The downside: it only works for sites you've verified in Search Console, and it only tests against the live production file. It can't test a file before you deploy it.

Validate programmatically

For CI/CD pipelines, you can validate robots.txt as part of your build process. A basic check:

# Build your site
pnpm build

# Check that robots.txt exists and doesn't block everything
ROBOTS=$(curl -s http://localhost:3000/robots.txt)

if echo "$ROBOTS" | grep -q "Disallow: /$"; then
  echo "ERROR: robots.txt blocks all crawling"
  exit 1
fi

if ! echo "$ROBOTS" | grep -qi "sitemap:"; then
  echo "WARNING: robots.txt missing Sitemap directive"
fi

The five-point robots.txt checklist

Before every deploy, verify:

  1. No Disallow: / - Unless you're intentionally blocking all crawling (staging environments only)
  2. CSS and JS are accessible - Don't block /_next/, /static/, or /assets/
  3. Sitemap directive is present - Sitemap: https://yourdomain.com/sitemap.xml
  4. Paths use trailing slashes - /api/ not /api
  5. Environment logic is correct - Production config doesn't inherit staging rules

The fastest way to check

If you want to skip the manual testing and validate your robots.txt in seconds, paste your URL into the LintPage Robots.txt Validator. It catches syntax errors, overly broad blocks, missing sitemaps, and conflicting rules automatically.

§ try this tool
Robots.txt Validator
Validate your robots.txt file for syntax errors and blocking rules.
try it free →
§ about the author
Marius OrzaruFounder, LintPage (BludeskSoft)

I built LintPage after a single stray noindex tag slipped into production and quietly cost us 47 days of organic traffic. It now runs the 60 automated checks I wish we had run before that deploy.

LinkedIn →

Get notified when we publish new posts.

§ run all 60 checks at once

Want the full picture? Stop checking one thing at a time.

Get a complete pre-launch SEO audit of your site with a single click.

run a full audit →
lintpage

Pre-launch SEO linting for developers. Catch disasters before they ship.

Product

  • Overview
  • Pre-launch checks
  • Full audit

Free tools

  • Meta tag checker
  • Robots.txt validator
  • AI crawler checker
  • OG preview
  • Sitemap validator
  • Heading checker
  • SSL checker
  • Redirect checker
  • Structured data validator
  • Broken link checker
  • Core Web Vitals checker
  • Security headers checker
  • Canonical tag checker
  • All tools →

Resources

  • Blog
  • About
  • RSS feed
  • Contact

Legal

  • Privacy
  • Terms
© 2026 lintpage. All rights reserved.built after one too many post-mortems.