XML Sitemaps: How They Work & Validation

seo sitemap xml crawling indexing
XML Sitemaps: How They Work & Validation

A sitemap is a roadmap for search engines. It lists your pages, tells crawlers when they were last updated, and hints at their relative importance.

You don’t strictly need a sitemap—search engines can find pages by following links. But sitemaps make discovery faster and give you more control over how your site gets crawled.

What’s in a Sitemap

An XML sitemap follows a standard format:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-02-08</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2026-01-15</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Sitemap Elements

ElementRequiredPurpose
<loc>YesThe full URL of the page
<lastmod>NoWhen the page was last modified
<changefreq>NoHow often the page changes (hint only)
<priority>NoRelative importance (0.0-1.0)

Google officially ignores changefreq and priority. But lastmod is useful if accurate—it helps crawlers know which pages to re-check.

When Sitemaps Matter Most

Sitemaps are especially important for:

Large Sites

Sites with thousands of pages can’t rely on link-based discovery alone. A sitemap ensures nothing gets orphaned.

New Sites

New domains don’t have inbound links yet. A sitemap submitted to Google Search Console jumpstarts indexing.

Sites with Poor Internal Linking

If pages are buried deep or not well-linked, sitemaps help crawlers find them.

Frequently Updated Content

News sites and blogs benefit from lastmod dates that signal fresh content.

Sitemap Best Practices

One Sitemap or Many?

Large sites split sitemaps using a sitemap index:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2026-02-08</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-02-07</lastmod>
  </sitemap>
</sitemapindex>

Each individual sitemap can contain up to 50,000 URLs and must be under 50MB uncompressed.

Reference in robots.txt

Tell crawlers where to find your sitemap:

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Use Accurate lastmod Dates

Only update lastmod when content actually changes. If you regenerate the sitemap daily with today’s date on every page, the signal becomes meaningless.

Only Include Canonical URLs

Don’t include:

  • URLs that redirect
  • Non-canonical versions (with trailing slashes, query parameters, etc.)
  • Pages blocked by robots.txt
  • Pages with noindex meta tags

Common Sitemap Problems

Invalid XML

XML parsing is strict. Common errors:

<!-- Missing closing tag -->
<url>
  <loc>https://example.com/page

<!-- Unescaped ampersand -->
<loc>https://example.com/?a=1&b=2</loc>

<!-- Should be: -->
<loc>https://example.com/?a=1&amp;b=2</loc>

Wrong Namespace

The namespace declaration must be exact:

<!-- Correct -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

<!-- Wrong - will fail validation -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap">

URLs Not Matching Domain

All URLs must match the sitemap’s domain:

<!-- sitemap at https://example.com/sitemap.xml -->

<!-- Valid -->
<loc>https://example.com/page</loc>

<!-- Invalid - different domain -->
<loc>https://other-domain.com/page</loc>

<!-- Invalid - different protocol -->
<loc>http://example.com/page</loc>

Unreachable URLs

Every URL in your sitemap should return a 200 status. Including 404s or 500s wastes crawl budget and signals poor quality.

Outdated Sitemap

Static sitemaps that aren’t updated become useless. If you add pages but never update the sitemap, new content won’t be discovered through it.

Dynamic Sitemap Generation

Most CMS platforms and frameworks generate sitemaps automatically:

WordPress

Plugins like Yoast SEO or Rank Math generate and update sitemaps automatically.

Static Site Generators

  • Astro: @astrojs/sitemap integration
  • Next.js: next-sitemap package
  • Hugo: Built-in sitemap generation
  • Gatsby: gatsby-plugin-sitemap

Custom Solutions

For dynamic sites, generate sitemaps from your database:

// Pseudocode
async function generateSitemap() {
  const pages = await db.query('SELECT url, updated_at FROM pages WHERE status = "published"');

  const xml = pages
    .map(
      (page) => `
    <url>
      <loc>${page.url}</loc>
      <lastmod>${page.updated_at.toISOString()}</lastmod>
    </url>
  `
    )
    .join('');

  return `<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
      ${xml}
    </urlset>`;
}

Validating Your Sitemap

Before submitting to search engines, validate that your sitemap:

  1. Is valid XML: Parses without errors
  2. Uses correct schema: Proper namespace declaration
  3. Contains valid URLs: All URLs are reachable
  4. Has reasonable lastmod dates: Not all in the future or distant past
  5. Doesn’t exceed limits: Under 50,000 URLs and 50MB

Our Sitemap Validator checks all of this automatically:

  • Fetches and parses your sitemap
  • Validates XML structure
  • Counts URLs and checks for the 50,000 limit
  • Identifies malformed entries
  • Checks for common issues

It’s free and gives you a clear picture of your sitemap’s health.

Submitting to Search Engines

Google Search Console

  1. Go to Search Console
  2. Select your property
  3. Navigate to Sitemaps
  4. Enter your sitemap URL
  5. Click Submit

Google shows submission status, crawl results, and any errors discovered.

Bing Webmaster Tools

Similar process—submit in the Sitemaps section.

Direct Ping

You can also ping search engines directly:

https://www.google.com/ping?sitemap=https://example.com/sitemap.xml
https://www.bing.com/ping?sitemap=https://example.com/sitemap.xml

Monitoring Sitemap Performance

In Google Search Console, the Sitemaps report shows:

  • Last read date
  • Status (success/error)
  • Discovered URLs
  • Indexed URLs (via the Pages report)

If “discovered” is much higher than “indexed,” investigate why pages aren’t being indexed.

Sitemaps vs. Internal Linking

Sitemaps don’t replace good internal linking. They complement it:

  • Internal links: Tell crawlers (and users) how pages relate
  • Sitemaps: Ensure comprehensive discovery

A page linked from your homepage will likely be found without a sitemap. An orphaned page deep in your archive needs the sitemap.

Take Action

  1. Check if your sitemap exists at /sitemap.xml
  2. Run it through our Sitemap Validator
  3. Fix any issues
  4. Submit to Google Search Console if you haven’t already
  5. Reference it in your robots.txt

For help with sitemap configuration or SEO strategy, reach out.