XML Sitemaps: How They Work & Validation
A sitemap is a roadmap for search engines. It lists your pages, tells crawlers when they were last updated, and hints at their relative importance.
You don’t strictly need a sitemap—search engines can find pages by following links. But sitemaps make discovery faster and give you more control over how your site gets crawled.
What’s in a Sitemap
An XML sitemap follows a standard format:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-02-08</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2026-01-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Sitemap Elements
| Element | Required | Purpose |
|---|---|---|
<loc> | Yes | The full URL of the page |
<lastmod> | No | When the page was last modified |
<changefreq> | No | How often the page changes (hint only) |
<priority> | No | Relative importance (0.0-1.0) |
Google officially ignores changefreq and priority. But lastmod is useful if accurate—it helps crawlers know which pages to re-check.
When Sitemaps Matter Most
Sitemaps are especially important for:
Large Sites
Sites with thousands of pages can’t rely on link-based discovery alone. A sitemap ensures nothing gets orphaned.
New Sites
New domains don’t have inbound links yet. A sitemap submitted to Google Search Console jumpstarts indexing.
Sites with Poor Internal Linking
If pages are buried deep or not well-linked, sitemaps help crawlers find them.
Frequently Updated Content
News sites and blogs benefit from lastmod dates that signal fresh content.
Sitemap Best Practices
One Sitemap or Many?
Large sites split sitemaps using a sitemap index:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2026-02-08</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-02-07</lastmod>
</sitemap>
</sitemapindex>
Each individual sitemap can contain up to 50,000 URLs and must be under 50MB uncompressed.
Reference in robots.txt
Tell crawlers where to find your sitemap:
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Use Accurate lastmod Dates
Only update lastmod when content actually changes. If you regenerate the sitemap daily with today’s date on every page, the signal becomes meaningless.
Only Include Canonical URLs
Don’t include:
- URLs that redirect
- Non-canonical versions (with trailing slashes, query parameters, etc.)
- Pages blocked by robots.txt
- Pages with
noindexmeta tags
Common Sitemap Problems
Invalid XML
XML parsing is strict. Common errors:
<!-- Missing closing tag -->
<url>
<loc>https://example.com/page
<!-- Unescaped ampersand -->
<loc>https://example.com/?a=1&b=2</loc>
<!-- Should be: -->
<loc>https://example.com/?a=1&b=2</loc>
Wrong Namespace
The namespace declaration must be exact:
<!-- Correct -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<!-- Wrong - will fail validation -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap">
URLs Not Matching Domain
All URLs must match the sitemap’s domain:
<!-- sitemap at https://example.com/sitemap.xml -->
<!-- Valid -->
<loc>https://example.com/page</loc>
<!-- Invalid - different domain -->
<loc>https://other-domain.com/page</loc>
<!-- Invalid - different protocol -->
<loc>http://example.com/page</loc>
Unreachable URLs
Every URL in your sitemap should return a 200 status. Including 404s or 500s wastes crawl budget and signals poor quality.
Outdated Sitemap
Static sitemaps that aren’t updated become useless. If you add pages but never update the sitemap, new content won’t be discovered through it.
Dynamic Sitemap Generation
Most CMS platforms and frameworks generate sitemaps automatically:
WordPress
Plugins like Yoast SEO or Rank Math generate and update sitemaps automatically.
Static Site Generators
- Astro:
@astrojs/sitemapintegration - Next.js:
next-sitemappackage - Hugo: Built-in sitemap generation
- Gatsby:
gatsby-plugin-sitemap
Custom Solutions
For dynamic sites, generate sitemaps from your database:
// Pseudocode
async function generateSitemap() {
const pages = await db.query('SELECT url, updated_at FROM pages WHERE status = "published"');
const xml = pages
.map(
(page) => `
<url>
<loc>${page.url}</loc>
<lastmod>${page.updated_at.toISOString()}</lastmod>
</url>
`
)
.join('');
return `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${xml}
</urlset>`;
}
Validating Your Sitemap
Before submitting to search engines, validate that your sitemap:
- Is valid XML: Parses without errors
- Uses correct schema: Proper namespace declaration
- Contains valid URLs: All URLs are reachable
- Has reasonable lastmod dates: Not all in the future or distant past
- Doesn’t exceed limits: Under 50,000 URLs and 50MB
Our Sitemap Validator checks all of this automatically:
- Fetches and parses your sitemap
- Validates XML structure
- Counts URLs and checks for the 50,000 limit
- Identifies malformed entries
- Checks for common issues
It’s free and gives you a clear picture of your sitemap’s health.
Submitting to Search Engines
Google Search Console
- Go to Search Console
- Select your property
- Navigate to Sitemaps
- Enter your sitemap URL
- Click Submit
Google shows submission status, crawl results, and any errors discovered.
Bing Webmaster Tools
Similar process—submit in the Sitemaps section.
Direct Ping
You can also ping search engines directly:
https://www.google.com/ping?sitemap=https://example.com/sitemap.xml
https://www.bing.com/ping?sitemap=https://example.com/sitemap.xml
Monitoring Sitemap Performance
In Google Search Console, the Sitemaps report shows:
- Last read date
- Status (success/error)
- Discovered URLs
- Indexed URLs (via the Pages report)
If “discovered” is much higher than “indexed,” investigate why pages aren’t being indexed.
Sitemaps vs. Internal Linking
Sitemaps don’t replace good internal linking. They complement it:
- Internal links: Tell crawlers (and users) how pages relate
- Sitemaps: Ensure comprehensive discovery
A page linked from your homepage will likely be found without a sitemap. An orphaned page deep in your archive needs the sitemap.
Take Action
- Check if your sitemap exists at
/sitemap.xml - Run it through our Sitemap Validator
- Fix any issues
- Submit to Google Search Console if you haven’t already
- Reference it in your robots.txt
For help with sitemap configuration or SEO strategy, reach out.