The first file Google reads on your site
Before Google crawls a single page on your domain, it checks one file: /robots.txt. This plain text file tells search engine bots which parts of your site they're allowed to crawl - and which parts they should skip.
Get it right, and crawlers efficiently discover your content. Get it wrong, and you might block Google from indexing your entire site. Here's how to write a robots.txt that actually works.
What robots.txt does (and doesn't do)
A common misconception: robots.txt controls indexing. It doesn't. It controls crawling - whether a bot visits the URL at all. A page blocked by robots.txt can still appear in Google's index if other pages link to it (it just won't have any content in the snippet).
If you want to prevent indexing, use a noindex meta tag instead. Use robots.txt to manage crawl budget and keep bots out of areas that don't need crawling.
The basic syntax
A robots.txt file lives at the root of your domain (https://example.com/robots.txt) and follows a simple format:
User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Disallow: /auth/
Sitemap: https://example.com/sitemap.xml
- User-agent specifies which crawler the rules apply to.
*means all crawlers. - Allow explicitly permits crawling of a path.
- Disallow tells crawlers not to request URLs matching the path.
- Sitemap points crawlers to your XML sitemap.
Rules are matched by prefix. Disallow: /api/ blocks /api/scan, /api/users, and everything under /api/.
Three mistakes that block Google
1. The staging Disallow: /
This is the most dangerous robots.txt rule. A single line that tells every crawler to stay away from every page:
User-agent: *
Disallow: /
It's standard practice on staging environments - you don't want Google indexing your test site. The problem is when this ships to production. It happens more often than you'd think, and the result is zero organic traffic until someone notices.
2. Blocking CSS and JavaScript
Some robots.txt files block /assets/, /static/, or /_next/ directories. This prevents Googlebot from rendering your pages, which means it can't properly evaluate your content:
# Don't do this
User-agent: *
Disallow: /_next/
Disallow: /static/
Google needs access to your CSS and JavaScript to render pages like a browser. Blocking these resources can hurt your rankings because Google can't see what your page actually looks like.
3. Missing Sitemap directive
The Sitemap line is technically optional, but omitting it is a missed opportunity. It's the easiest way to tell search engines where your sitemap lives without relying on Google Search Console:
Sitemap: https://example.com/sitemap.xml
Always include it. It costs nothing and helps crawlers discover your pages faster.
Robots.txt Validator
Validate your robots.txt file for syntax errors and blocking rules.
A robots.txt template for Next.js
Here's a solid starting point for most Next.js applications:
User-agent: *
Allow: /
Disallow: /api/
Disallow: /dashboard/
Disallow: /auth/
Disallow: /admin/
Sitemap: https://yourdomain.com/sitemap.xml
In Next.js 13+, you can generate this dynamically with a robots.ts file in your app directory:
import type { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{
userAgent: '*',
allow: '/',
disallow: ['/api/', '/dashboard/', '/auth/'],
},
],
sitemap: 'https://yourdomain.com/sitemap.xml',
};
}
This approach is better than a static file because it's type-safe, version-controlled with your code, and can be dynamically configured per environment.
Testing your robots.txt
After writing your robots.txt, validate it. Common issues include:
- Typos in directives -
Dissallowinstead ofDisallowis silently ignored - Wrong path prefixes -
/apimatches/api-docstoo; use/api/with a trailing slash - Conflicting rules - multiple
User-agentblocks with overlapping rules can be confusing
The fastest way to check? Run your site through the LintPage Robots.txt Validator. It catches syntax errors, overly broad blocks, missing sitemaps, and conflicting rules - all in a few seconds.
The bottom line
Your robots.txt is a small file with outsized impact. Write it intentionally, validate it before deploying, and re-check it after every major deployment. One wrong line can make your site invisible to Google, and you might not notice until your traffic has already disappeared.