Robots.txt Viewer & Parser

Fetch and parse any site's robots.txt - User-agent groups, Allow and Disallow paths, sitemaps, and Crawl-delay. No signup, instant results.

What it checks

Every directive in your robots.txt, parsed and grouped.

One fetch, one structured view - User-agents, paths, sitemaps, and the raw file.

User-agent groups

Splits the file into per-crawler rule blocks - Googlebot, Bingbot, GPTBot, the catch-all *, and anything else the site has called out by name.

Allow & Disallow paths

Lists every Allow and Disallow directive in each group, in the order it appears, so you can see exactly which URL prefixes are open and which are off-limits.

Sitemap discovery

Surfaces every Sitemap: line in the file. Search engines pick these up to find your XML sitemaps even without a Search Console submission.

Crawl-delay

Reports any Crawl-delay value advertised to bots. Honored by Bing, Yandex, and others - useful when an origin can't take aggressive crawling.

Syntax & status

Verifies the file returns 200 OK with a sensible Content-Type, and flags malformed lines or stray BOMs that quietly break crawler parsing.

Raw view

Shows the original, unedited robots.txt alongside the parsed output - so you can diff what you intended against what's actually being served.

How it works

From domain to parsed rules in about a second.

No signup, no command line - just paste and read.

Enter a domain

Paste any URL or hostname - we'll normalize it and fetch the robots.txt from the site root.

Fetch from the edge

We request /robots.txt from our edge, follow up to one redirect, and capture the raw response.

Read the parsed rules

Per-User-agent groups, every Allow and Disallow, sitemap URLs, and the raw file - all in one view.

Why robots.txt matters

One file. Outsized SEO consequences.

Robots.txt sits at the root of your site and quietly shapes how every search engine sees it.

Crawl budget

Googlebot allocates a finite number of requests per site. Blocking faceted search, internal pagination, and other thin URLs in robots.txt keeps that budget focused on pages that actually matter for rankings.

Accidental deindexing

A single stray Disallow: / pushed in a deploy can silently de-rank your whole site. Robots.txt is one of the easiest ways to lose traffic overnight - and one of the easiest to monitor for.

Sitemap discovery

The Sitemap: directive is how crawlers find your XML sitemap without Search Console. Missing or wrong sitemap URLs slow indexing of new pages by days or weeks.

Reference

Robots.txt directives, explained.

Each line in a robots.txt file is one of a handful of directives - here's what they mean.

User-agent

Names the crawler the following rules apply to. User-agent: * is the catch-all group that applies when no more specific block matches the bot.

Disallow

Tells the named crawler not to fetch URLs that start with the given path. Disallow: / blocks the whole site; Disallow: with no path means allow everything.

Allow

Carves an exception out of a Disallow. The most specific (longest) matching rule wins, so Allow: /public/ overrides a parent Disallow: /.

Sitemap

Absolute URL of an XML sitemap. Multiple Sitemap: lines are allowed and are read by search engines as part of discovery, independent of any User-agent group.

Crawl-delay

Seconds the crawler should wait between requests. Honored by Bing, Yandex, and Seznam - ignored by Google, which uses its own adaptive rate.

Comments (#)

Anything after a # on a line is a comment. Useful for documenting why a rule exists, but ignored by crawlers when matching paths.

FAQ

Frequently asked questions.

Quick answers about robots.txt and how to use this tool well.

Is this tool really free?

Yes - no signup, no email harvesting, no rate-limit gate. We rate-limit per-IP to keep it fast for everyone, but otherwise it's open. The paid product is the monitoring side.

Does a missing robots.txt mean my site is blocked?

No - the opposite. When robots.txt is missing or returns 404, crawlers treat the whole site as allowed. A robots.txt is only needed when you want to restrict something.

Can robots.txt deindex pages that are already in Google?

No. Disallow prevents crawling, not indexing. A page that's already indexed (or linked from elsewhere) can stay in search results - just without a description. To remove pages, use a noindex meta tag or the Search Console removal tool.

Which directives do all crawlers respect?

User-agent, Disallow, Allow, and Sitemap are universally supported. Crawl-delay is honored by Bing and Yandex but ignored by Google. Non-standard fields like Clean-param are vendor-specific.

Does this tool block based on the rules?

No - we fetch and display the file as-is. Crawlers themselves choose whether to honor robots.txt. We don't enforce any rules, we just show you what's there.

How do I keep robots.txt from quietly breaking?

Robots.txt is a single file that can break crawl budget and indexing in seconds. SiteTrak monitors it continuously and alerts the moment it 404s, changes, or starts blocking paths it used to allow.

Keep going

Other free tools you'll like.

Run one once, or set up SiteTrak and never run them again.

Robots.txt Viewer & Parser

Every directive in your robots.txt, parsed and grouped.

User-agent groups

Allow & Disallow paths

Sitemap discovery

Crawl-delay

Syntax & status

Raw view

From domain to parsed rules in about a second.

Enter a domain

Fetch from the edge

Read the parsed rules

One file. Outsized SEO consequences.

Crawl budget

Accidental deindexing

Sitemap discovery

Robots.txt directives, explained.

Frequently asked questions.

Other free tools you'll like.

Sitemap Viewer

Meta Tag Checker

HTTP Header Inspector

Redirect Checker

Run it once. Or have SiteTrak watch it forever.