Sitemap

The robots.txt file points to a sitemap file.

Listen

A podcast overview related to Sitemap made with Google NotebookLM.

Impact

(How ScanGov measures tasklist priorities.)

Why it's important

Pointing to a sitemap file in the robots.txt file helps search engines discover the sitemap easily, improving crawl efficiency. This ensures that all important URLs are included in the search engine’s index and enhances visibility in search results.

User stories

As a search engine bot, I want the `robots.txt` file to point to the sitemap so that I can easily discover and crawl the full sitemap of the website, ensuring comprehensive indexing and better visibility in search results.

Error

(ScanGov messaging when a site fails a standard)

robots.txt missing sitemap reference.

About

A sitemap is a file that lists all of the pages on a website in addition to information about each so that search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support sitemaps to pick up all Uniform Resources Locators (URLs) in the sitemap and learn about them using the associated metadata. Using sitemap protocol doesn’t guarantee web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.

The sitemap is an Extensible Markup Language (XML) file on the website’s root directory that include metadata for each URL, such as:

  • when it was last updated
  • how often it usually changes
  • how important it is, relative to other URLs in the site

Examples

Example government website sitemaps:

Code

Example sitemap code:

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <url>

      <loc>http://www.example.com/</loc>

      <lastmod>2005-01-01</lastmod>

      <changefreq>monthly</changefreq>

      <priority>0.8</priority>

   </url>

</urlset>

Guidance

Indicator

Feedback