The first file Google reads on your site

Before Google crawls a single page on your domain, it checks one file: /robots.txt. This plain text file tells search engine bots which parts of your site they're allowed to crawl - and which parts they should skip.

Get it right, and crawlers efficiently discover your content. Get it wrong, and you might block Google from indexing your entire site. Here's how to write a robots.txt that actually works.

What robots.txt does (and doesn't do)

A common misconception: robots.txt controls indexing. It doesn't. It controls crawling - whether a bot visits the URL at all. A page blocked by robots.txt can still appear in Google's index if other pages link to it (it just won't have any content in the snippet).

If you want to prevent indexing, use a noindex meta tag instead. Use robots.txt to manage crawl budget and keep bots out of areas that don't need crawling.

The basic syntax

A robots.txt file lives at the root of your domain (https://example.com/robots.txt) and follows a simple format:

User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Disallow: /auth/

Sitemap: https://example.com/sitemap.xml

User-agent specifies which crawler the rules apply to. * means all crawlers.
Allow explicitly permits crawling of a path.
Disallow tells crawlers not to request URLs matching the path.
Sitemap points crawlers to your XML sitemap.

Rules are matched by prefix. Disallow: /api/ blocks /api/scan, /api/users, and everything under /api/.

Three mistakes that block Google

1. The staging `Disallow: /`

This is the most dangerous robots.txt rule. A single line that tells every crawler to stay away from every page:

User-agent: *
Disallow: /

It's standard practice on staging environments - you don't want Google indexing your test site. The problem is when this ships to production. It happens more often than you'd think, and the result is zero organic traffic until someone notices.

2. Blocking CSS and JavaScript

Some robots.txt files block /assets/, /static/, or /_next/ directories. This prevents Googlebot from rendering your pages, which means it can't properly evaluate your content:

# Don't do this
User-agent: *
Disallow: /_next/
Disallow: /static/

Google needs access to your CSS and JavaScript to render pages like a browser. Blocking these resources can hurt your rankings because Google can't see what your page actually looks like.

3. Missing Sitemap directive

The Sitemap line is technically optional, but omitting it is a missed opportunity. It's the easiest way to tell search engines where your sitemap lives without relying on Google Search Console:

Sitemap: https://example.com/sitemap.xml

Always include it. It costs nothing and helps crawlers discover your pages faster.

§ try this tool

Robots.txt Validator

Validate your robots.txt file for syntax errors and blocking rules.

try it free →

A robots.txt template for Next.js

Here's a solid starting point for most Next.js applications:

User-agent: *
Allow: /
Disallow: /api/
Disallow: /dashboard/
Disallow: /auth/
Disallow: /admin/

Sitemap: https://yourdomain.com/sitemap.xml

In Next.js 13+, you can generate this dynamically with a robots.ts file in your app directory:

import type { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        userAgent: '*',
        allow: '/',
        disallow: ['/api/', '/dashboard/', '/auth/'],
      },
    ],
    sitemap: 'https://yourdomain.com/sitemap.xml',
  };
}

This approach is better than a static file because it's type-safe, version-controlled with your code, and can be dynamically configured per environment.

Testing your robots.txt

After writing your robots.txt, validate it. Common issues include:

Typos in directives - Dissallow instead of Disallow is silently ignored
Wrong path prefixes - /api matches /api-docs too; use /api/ with a trailing slash
Conflicting rules - multiple User-agent blocks with overlapping rules can be confusing

The fastest way to check? Run your site through the LintPage Robots.txt Validator. It catches syntax errors, overly broad blocks, missing sitemaps, and conflicting rules - all in a few seconds.

The bottom line

Your robots.txt is a small file with outsized impact. Write it intentionally, validate it before deploying, and re-check it after every major deployment. One wrong line can make your site invisible to Google, and you might not notice until your traffic has already disappeared.

The first file Google reads on your site

Get it right, and crawlers efficiently discover your content. Get it wrong, and you might block Google from indexing your entire site. Here's how to write a robots.txt that actually works.

What robots.txt does (and doesn't do)

If you want to prevent indexing, use a noindex meta tag instead. Use robots.txt to manage crawl budget and keep bots out of areas that don't need crawling.

The basic syntax

A robots.txt file lives at the root of your domain (https://example.com/robots.txt) and follows a simple format:

User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Disallow: /auth/

Sitemap: https://example.com/sitemap.xml

User-agent specifies which crawler the rules apply to. * means all crawlers.
Allow explicitly permits crawling of a path.
Disallow tells crawlers not to request URLs matching the path.
Sitemap points crawlers to your XML sitemap.

Rules are matched by prefix. Disallow: /api/ blocks /api/scan, /api/users, and everything under /api/.

Three mistakes that block Google

1. The staging `Disallow: /`

This is the most dangerous robots.txt rule. A single line that tells every crawler to stay away from every page:

User-agent: *
Disallow: /

2. Blocking CSS and JavaScript

Some robots.txt files block /assets/, /static/, or /_next/ directories. This prevents Googlebot from rendering your pages, which means it can't properly evaluate your content:

# Don't do this
User-agent: *
Disallow: /_next/
Disallow: /static/

Google needs access to your CSS and JavaScript to render pages like a browser. Blocking these resources can hurt your rankings because Google can't see what your page actually looks like.

3. Missing Sitemap directive

The Sitemap line is technically optional, but omitting it is a missed opportunity. It's the easiest way to tell search engines where your sitemap lives without relying on Google Search Console:

Sitemap: https://example.com/sitemap.xml

Always include it. It costs nothing and helps crawlers discover your pages faster.

§ try this tool

Robots.txt Validator

Validate your robots.txt file for syntax errors and blocking rules.

try it free →

A robots.txt template for Next.js

Here's a solid starting point for most Next.js applications:

User-agent: *
Allow: /
Disallow: /api/
Disallow: /dashboard/
Disallow: /auth/
Disallow: /admin/

Sitemap: https://yourdomain.com/sitemap.xml

In Next.js 13+, you can generate this dynamically with a robots.ts file in your app directory:

import type { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      {
        userAgent: '*',
        allow: '/',
        disallow: ['/api/', '/dashboard/', '/auth/'],
      },
    ],
    sitemap: 'https://yourdomain.com/sitemap.xml',
  };
}

This approach is better than a static file because it's type-safe, version-controlled with your code, and can be dynamically configured per environment.

Testing your robots.txt

After writing your robots.txt, validate it. Common issues include:

Typos in directives - Dissallow instead of Disallow is silently ignored
Wrong path prefixes - /api matches /api-docs too; use /api/ with a trailing slash
Conflicting rules - multiple User-agent blocks with overlapping rules can be confusing

The fastest way to check? Run your site through the LintPage Robots.txt Validator. It catches syntax errors, overly broad blocks, missing sitemaps, and conflicting rules - all in a few seconds.

How to Write a robots.txt That Doesn't Block Google

The first file Google reads on your site

What robots.txt does (and doesn't do)

The basic syntax

Three mistakes that block Google

1. The staging `Disallow: /`

2. Blocking CSS and JavaScript

3. Missing Sitemap directive

A robots.txt template for Next.js

Testing your robots.txt

The bottom line

Get notified when we publish new posts.

Want the full picture? Stop checking one thing at a time.

How to Write a robots.txt That Doesn't Block Google

The first file Google reads on your site

What robots.txt does (and doesn't do)

The basic syntax

Three mistakes that block Google

1. The staging `Disallow: /`

2. Blocking CSS and JavaScript

3. Missing Sitemap directive

A robots.txt template for Next.js

Testing your robots.txt

The bottom line

Get notified when we publish new posts.

Want the full picture? Stop checking one thing at a time.

How to Write a robots.txt That Doesn't Block Google

The first file Google reads on your site

What robots.txt does (and doesn't do)

The basic syntax

Three mistakes that block Google

1. The staging Disallow: /

2. Blocking CSS and JavaScript

3. Missing Sitemap directive

A robots.txt template for Next.js

Testing your robots.txt

The bottom line

Get notified when we publish new posts.

Want the full picture? Stop checking one thing at a time.

How to Write a robots.txt That Doesn't Block Google

The first file Google reads on your site

What robots.txt does (and doesn't do)

The basic syntax

Three mistakes that block Google

1. The staging Disallow: /

2. Blocking CSS and JavaScript

3. Missing Sitemap directive

A robots.txt template for Next.js

Testing your robots.txt

The bottom line

Get notified when we publish new posts.

Want the full picture? Stop checking one thing at a time.

1. The staging `Disallow: /`

1. The staging `Disallow: /`