Robots valid

The site has a valid robots policy.

Listen

A podcast overview related to Robots valid made with Google NotebookLM.

Impact

(How ScanGov measures tasklist priorities.)

Why it's important

A valid robots.txt file helps search engines understand which pages or sections of the website should not be crawled, preventing the waste of crawl budget and avoiding indexing of sensitive or unnecessary content.

Error

(ScanGov messaging when a site fails a standard)

Robots.txt is missing or invalid.

Guidance

All government websites must allow robots indexing and following.

About

Robots valid means the robots.txt file is correctly written and follows the rules that search engines understand. If it's valid, search engines can read the file and know which parts of the site they can or can’t visit. A valid file helps control how your site shows up in search results and avoids errors.

Examples

Example government website robots.txt files:

Code

Example robots.txt code:

# Only applies to search.gov scraping
User-agent: usasearch  
# Slow amount of requests
Crawl-delay: 2
# Specify it can read /archive/
Allow: /archive/

# Applies to all scrapers
User-agent: *
# Slow amount of requests to 1 every 10 seconds
Crawl-delay: 10
# Don't let them read /archive/
Disallow: /archive/

# Point to a sitemap file
Sitemap: /sitemap.xml

Example robots meta code:

<!-- This page can be indexed 
and links on it can be followed -->
<meta name="robots" content="index, follow">

Example X-Robots-Tag:

# This URL can be indexed 
# and links on it can be followed
X-Robots-Tag: index, follow

Guidance

Indicator

Feedback