The roadmap nobody checks
A sitemap.xml is a machine-readable list of every important URL on your site. Search engines use it to discover pages, understand site structure, and prioritize what to crawl. It's one of the most important SEO files on your site - and one of the most neglected.
Most developers set up a sitemap once and forget about it. The problem is that sitemaps break silently. Pages get removed but stay in the sitemap. New pages get added but never make it in. The XML gets malformed after a build change. And because nobody ever checks, the issues compound over time.
What a valid sitemap looks like
A basic sitemap.xml follows this structure:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2025-01-15</lastmod>
<changefreq>weekly</changefreq> <!-- optional; Google ignores this -->
<priority>1.0</priority> <!-- optional; Google ignores this -->
</url>
<url>
<loc>https://example.com/pricing</loc>
<lastmod>2025-01-10</lastmod>
<changefreq>monthly</changefreq> <!-- optional; Google ignores this -->
<priority>0.8</priority> <!-- optional; Google ignores this -->
</url>
</urlset>
- loc (required) - The full URL of the page. Must be absolute and include the protocol.
- lastmod - When the page was last meaningfully modified. Helps crawlers decide what to re-crawl.
- changefreq - A hint about how often the content changes. Google mostly ignores this.
- priority - A hint about relative importance (0.0 to 1.0). Google also mostly ignores this, but it's still conventionally included.
Four ways your sitemap is probably broken
1. URLs that return non-200 status codes
The most common issue. Your sitemap lists URLs that return 404, 301, or 500 errors. This wastes crawl budget and signals to Google that your sitemap isn't maintained:
<!-- This page was deleted three months ago -->
<url>
<loc>https://example.com/old-feature</loc>
</url>
Audit your sitemap regularly. Every URL listed should return a 200 status code.
2. Missing pages
The opposite problem: important pages that exist on your site but aren't in the sitemap. New pages, recently added features, or blog posts that never got included.
If a page is worth ranking, it should be in your sitemap. If it's not worth ranking (admin panels, auth pages, API routes), it shouldn't be.
3. Invalid XML
Sitemaps must be valid XML. A single unclosed tag, an unescaped ampersand, or an extra whitespace in the wrong place can break the entire file:
<!-- Broken: unescaped ampersand -->
<loc>https://example.com/page?a=1&b=2</loc>
<!-- Fixed: properly escaped -->
<loc>https://example.com/page?a=1&b=2</loc>
XML parsers are strict. If your sitemap has invalid XML, search engines will reject the entire file - not just the broken entry.
4. Sitemap is too large
A single sitemap can contain at most 50,000 URLs and must not exceed 50MB uncompressed. For larger sites, use a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
</sitemap>
</sitemapindex>
Sitemap Validator
Validate your sitemap.xml file format and URL count.
Auto-generating sitemaps in Next.js
Don't maintain your sitemap manually. In Next.js 13+, create a sitemap.ts file in your app directory:
import type { MetadataRoute } from 'next';
export default function sitemap(): MetadataRoute.Sitemap {
return [
{
url: 'https://example.com',
lastModified: new Date(),
changeFrequency: 'weekly', // optional; Google ignores this
priority: 1, // optional; Google ignores this
},
{
url: 'https://example.com/pricing',
lastModified: new Date(),
changeFrequency: 'monthly', // optional; Google ignores this
priority: 0.8, // optional; Google ignores this
},
];
}
For dynamic routes, fetch your page slugs from your database or CMS and map them into the array. This way, your sitemap always reflects the actual state of your site.
The pages you should NOT include
Not every URL belongs in your sitemap:
- Auth pages (
/login,/signup,/reset-password) - no SEO value - API routes (
/api/*) - not meant for browsers - Admin/dashboard pages - behind authentication
- Paginated pages - include the canonical version, not every
?page=2 - Redirecting URLs - only include the final destination
- noindex pages - if you don't want it indexed, don't put it in the sitemap
Validate before you deploy
Your sitemap is a contract with search engines: "These are my important pages, and they all work." Breaking that contract - with dead URLs, invalid XML, or missing pages - tells Google your site isn't well-maintained.
Run your sitemap through the LintPage Sitemap Validator after every deployment. It checks XML validity, verifies URL accessibility, counts your pages, and flags common issues - all in seconds. If you want the full picture, a LintPage full scan checks your sitemap as part of 60+ automated SEO checks.