Robots valid (Botability)
The site has a valid robots policy.
On this page
Listen
A podcast overview related to Robots valid made with Google NotebookLM.
Impact
About
Robots valid means the robots.txt file is correctly written and follows the rules that search engines understand. If it's valid, search engines can read the file and know which parts of the site they can or can't visit. A valid file helps control how your site shows up in search results and avoids errors.
Why it's important
Guides search engines on which pages to crawl or avoid, ensuring important content is indexed and irrelevant pages aren't.
Code
Example robots.txt code:
# Only applies to search.gov scraping
User-agent: usasearch
# Slow amount of requests
Crawl-delay: 2
# Specify it can read /archive/
Allow: /archive/
# Applies to all scrapers
User-agent: *
# Slow amount of requests to 1 every 10 seconds
Crawl-delay: 10
# Don't let them read /archive/
Disallow: /archive/
# Point to a sitemap file
Sitemap: /sitemap.xml
Example robots meta code:
<!-- This page can be indexed
and links on it can be followed -->
<meta name="robots" content="index, follow">
Example X-Robots-Tag:
# This URL can be indexed
# and links on it can be followed
X-Robots-Tag: index, follow
Error
(ScanGov messaging when a site fails a standard)
Robots.txt is missing or invalid.
Examples
Example government website robots.txt files:
Guidance
Links
- robotstxt.org
- robots.txt (Search.gov)
- robots.txt (Wikipedia)
- Robots Meta Tags Specifications (Google)
Indicators
Related
Botability
- Crawlable
- Content available in document
- GovernmentOrganization Schema.org Type
- Sitemap status
- Sitemap XML
- Robots allowed
- Sitemap in robots.txt
- Canonical
- Link text
- hreflang
On Project ScanGov
Link copied!
On this page