Robots allowed

The robots policy allows access to browsers and scrapers.

Listen

A podcast overview related to Robots allowed made with Google NotebookLM.

Impact

(How ScanGov measures tasklist priorities.)

Why it's important

Allowing access to browsers and scrapers ensures that search engines can crawl the site effectively and index relevant content. This promotes better SEO, as search engines can gather and display up-to-date information from the site.

User stories

As a search engine crawler, I want to read the robots.txt file so that I know which parts of the website I am allowed or disallowed to index.

Error

(ScanGov messaging when a site fails a standard)

Robots policy blocks browsers or scrapers.

Guidance

All government websites must allow robots indexing and following.

About

Robots allowed in a robots.txt file means search engines like Google are allowed to look at and list (or “crawl”) the page in search results. Website owners use robots.txt to tell search engines what they can and can’t see. If robots are allowed, the site or page can show up in search engines, making it easier for people to find.

Examples

Example government website robots.txt files:

Code

Example robots.txt code:

# Only applies to search.gov scraping
User-agent: usasearch  
# Slow amount of requests
Crawl-delay: 2
# Specify it can read /archive/
Allow: /archive/

# Applies to all scrapers
User-agent: *
# Slow amount of requests to 1 every 10 seconds
Crawl-delay: 10
# Don't let them read /archive/
Disallow: /archive/

# Point to a sitemap file
Sitemap: /sitemap.xml

Example robots meta code:

<!-- This page can be indexed 
and links on it can be followed -->
<meta name="robots" content="index, follow">

Example X-Robots-Tag:

# This URL can be indexed 
# and links on it can be followed
X-Robots-Tag: index, follow

Guidance

Indicator

Feedback