Robots.txt Viewer & Parser
Fetch and parse any site's robots.txt - User-agent groups, Allow and Disallow paths, sitemaps, and Crawl-delay. No signup, instant results.
What it checks
Every directive in your robots.txt, parsed and grouped.
One fetch, one structured view - User-agents, paths, sitemaps, and the raw file.
User-agent groups
Splits the file into per-crawler rule blocks - Googlebot, Bingbot, GPTBot, the catch-all *, and anything else the site has called out by name.
Allow & Disallow paths
Lists every Allow and Disallow directive in each group, in the order it appears, so you can see exactly which URL prefixes are open and which are off-limits.
Sitemap discovery
Surfaces every Sitemap: line in the file. Search engines pick these up to find your XML sitemaps even without a Search Console submission.
Crawl-delay
Reports any Crawl-delay value advertised to bots. Honored by Bing, Yandex, and others - useful when an origin can't take aggressive crawling.
Syntax & status
Verifies the file returns 200 OK with a sensible Content-Type, and flags malformed lines or stray BOMs that quietly break crawler parsing.
Raw view
Shows the original, unedited robots.txt alongside the parsed output - so you can diff what you intended against what's actually being served.
How it works
From domain to parsed rules in about a second.
No signup, no command line - just paste and read.
Enter a domain
Paste any URL or hostname - we'll normalize it and fetch the robots.txt from the site root.
Fetch from the edge
We request /robots.txt from our edge, follow up to one redirect, and capture the raw response.
Read the parsed rules
Per-User-agent groups, every Allow and Disallow, sitemap URLs, and the raw file - all in one view.
Why robots.txt matters
One file. Outsized SEO consequences.
Robots.txt sits at the root of your site and quietly shapes how every search engine sees it.
Crawl budget
Googlebot allocates a finite number of requests per site. Blocking faceted search, internal pagination, and other thin URLs in robots.txt keeps that budget focused on pages that actually matter for rankings.
Accidental deindexing
A single stray Disallow: / pushed in a deploy can silently de-rank your whole site. Robots.txt is one of the easiest ways to lose traffic overnight - and one of the easiest to monitor for.
Sitemap discovery
The Sitemap: directive is how crawlers find your XML sitemap without Search Console. Missing or wrong sitemap URLs slow indexing of new pages by days or weeks.
Reference
Robots.txt directives, explained.
Each line in a robots.txt file is one of a handful of directives - here's what they mean.
User-agent
Names the crawler the following rules apply to. User-agent: * is the catch-all group that applies when no more specific block matches the bot.
Disallow
Tells the named crawler not to fetch URLs that start with the given path. Disallow: / blocks the whole site; Disallow: with no path means allow everything.
Allow
Carves an exception out of a Disallow. The most specific (longest) matching rule wins, so Allow: /public/ overrides a parent Disallow: /.
Sitemap
Absolute URL of an XML sitemap. Multiple Sitemap: lines are allowed and are read by search engines as part of discovery, independent of any User-agent group.
Crawl-delay
Seconds the crawler should wait between requests. Honored by Bing, Yandex, and Seznam - ignored by Google, which uses its own adaptive rate.
Comments (#)
Anything after a # on a line is a comment. Useful for documenting why a rule exists, but ignored by crawlers when matching paths.
FAQ
Frequently asked questions.
Quick answers about robots.txt and how to use this tool well.
Is this tool really free?
Yes - no signup, no email harvesting, no rate-limit gate. We rate-limit per-IP to keep it fast for everyone, but otherwise it's open. The paid product is the monitoring side.
Does a missing robots.txt mean my site is blocked?
No - the opposite. When robots.txt is missing or returns 404, crawlers treat the whole site as allowed. A robots.txt is only needed when you want to restrict something.
Can robots.txt deindex pages that are already in Google?
No. Disallow prevents crawling, not indexing. A page that's already indexed (or linked from elsewhere) can stay in search results - just without a description. To remove pages, use a noindex meta tag or the Search Console removal tool.
Which directives do all crawlers respect?
User-agent, Disallow, Allow, and Sitemap are universally supported. Crawl-delay is honored by Bing and Yandex but ignored by Google. Non-standard fields like Clean-param are vendor-specific.
Does this tool block based on the rules?
No - we fetch and display the file as-is. Crawlers themselves choose whether to honor robots.txt. We don't enforce any rules, we just show you what's there.
How do I keep robots.txt from quietly breaking?
Robots.txt is a single file that can break crawl budget and indexing in seconds. SiteTrak monitors it continuously and alerts the moment it 404s, changes, or starts blocking paths it used to allow.
Keep going
Other free tools you'll like.
Run one once, or set up SiteTrak and never run them again.
Sitemap Viewer
Fetch and inspect any XML sitemap, including sitemap indexes, lastmod, and image extensions.
Meta Tag Checker
Inspect title, description, Open Graph, Twitter Card, and canonical tags for any URL.
HTTP Header Inspector
Inspect response headers, CDN, cache configuration, and the full redirect chain.
Redirect Checker
Trace every hop in a redirect chain, with status codes and final destination.
