WebPixie
Skip to main content
Back to Features
Indexability checker

Indexability checker for search and AI engines

See whether search engines and AI tools can discover, crawl, and index your important pages, with robots.txt, llms.txt, meta robots, canonical, and hreflang checks.

Search and AI engines
Robots, canonical, hreflang
Free plan, no credit card
Key pages checked · search and AI engines
PageIndexableCanonicalChecked
/pricingNo (meta tag)self1d
/blog/launchAllowedmismatch1d
/Allowedself1d
/docsAllowedself1d
noindex on /pricing · canonical mismatch on /blog/launch
Indexability checks are included in the free plan.Compare all plans

A per-page indexable / blocked verdict

Coming soon

We're building a consolidated view that turns the indexing signals we already read (robots.txt, llms.txt, meta robots, canonical, hreflang) into one clear per-page verdict. Here is what it will look like.

PageVerdictDeciding signal
/IndexableNo blocks · self-referencing canonical
/pricingIndexableSelf-referencing canonical · no noindex
/staging/previewBlockednoindex meta tag
/adminBlockedrobots.txt Disallow
/blog/draftBlockedX-Robots-Tag: noindex
/fr/Warninghreflang alternate not reciprocated
/old-productWarningcanonical points to /new-product
llms.txtWarningUnreachable · AI engines cannot read rules

Coming soon. The underlying signal checks are live today across the TXT Files and Main Page views, and those indexability signals contribute to the composite WebPixie Site Score. The consolidated per-page verdict shown here is on the way.

How WebPixie checks your indexability

WebPixie analyzes whether search engines and AI tools can find and index your important pages. Each check reads robots.txt and llms.txt rules, meta robots tags, X-Robots-Tag headers, canonical URLs, hreflang definitions, and the rel attributes on your links, then surfaces the indexing signals behind each page.

A single accidental noindex tag or one restrictive robots.txt line can quietly remove a page from search results with no error to warn you. WebPixie surfaces these signals, so you can spot an accidental noindex, a Disallow rule, or a conflicting canonical or hreflang before it costs you traffic.

Coverage goes past classic SEO. The same check looks at llms.txt accessibility, so you can see whether AI engines are allowed to read your content as that traffic grows.

01

Catch accidental noindex and robots blocks

One wrong directive can drop a page from search

A noindex meta tag left on after a redesign, or a Disallow line added to robots.txt during a deploy, can remove a page from search results without any error. WebPixie reads meta robots tags, X-Robots-Tag headers, and robots.txt rules, then surfaces the blocking directive, so you catch it in the dashboard instead of in a traffic drop.

02

See conflicting canonical and hreflang signals

Canonical and hreflang tell engines which version wins

When a canonical tag points to the wrong URL, or hreflang alternates do not reciprocate, engines can index a version you did not intend or split signals across near-identical URLs. WebPixie extracts the canonical target and the hreflang map for each page and surfaces mismatches, so you can correct which URL engines treat as the original.

03

Check visibility for AI engines, not only search

llms.txt controls what AI tools may read

As AI assistants send more referral traffic, whether they can read your content matters alongside classic indexing. WebPixie checks llms.txt accessibility next to robots.txt, so you can see whether AI engines are allowed to access your pages and adjust the rules if they are not.

04

Confirm your robots.txt and link rel signals

robots.txt sitemap reference and rel nofollow, ugc, sponsored

WebPixie checks whether a sitemap is referenced in your robots.txt and reads the rel attributes on your links, including nofollow, ugc, and sponsored. That shows whether you point engines to a sitemap and how link signals are set across your site, which pairs with the per-link detail from the Link Crawler. Full sitemap monitoring is coming soon.

Check your indexability in 60 seconds

Free plan, no credit card. Included on every plan.

Everything you need to monitor a website. In one workspace.

A quick look at other WebPixie features.

Why teams choose WebPixie for indexability

Set up in 60 seconds

No agent to install and no access to your server needed. Enter a domain and WebPixie checks indexability from its own servers.

Your whole site in one workspace

Indexability sits next to uptime, SSL, DNS, domain, and link health, with one dashboard across all of them.

Search and AI coverage together

One check covers classic search indexing and AI-engine access, so you review both in the same report.

Frequently Asked Questions

Common questions about the indexability checker.

The Indexability Checker analyzes the signals that affect whether search engines and AI tools can discover, crawl, and index your important pages. WebPixie checks robots.txt rules, llms.txt availability, robots.txt sitemap references, meta robots tags, X-Robots-Tag headers, canonical URLs, hreflang declarations, and link rel attributes like nofollow, ugc, and sponsored. The Indexability Checker surfaces blocking directives, canonical conflicts, missing multilingual signals, and related crawlability issues; a consolidated per-page indexable/blocked verdict is coming soon. This helps you catch accidental noindex rules, restrictive robots.txt directives, missing sitemap references, and other technical SEO issues during review. The results pair well with the Link Crawler, which finds crawlable URLs and link-level signals across your site. For homepage-specific headers, redirects, cookies, and meta checks, use the Main Page Analyzer.

Uptime affects SEO by keeping your pages reachable for search engine crawlers and real users. A brief outage may not cause an immediate ranking change, but repeated downtime can interrupt crawling, reduce trust in page availability, waste crawl opportunities, and create poor user signals when visitors land on errors. Uptime monitoring helps you detect access problems quickly, including timeouts, unexpected status codes, and missing expected content. For marketers, this matters most on high-value pages such as campaign landing pages, product pages, checkout flows, and content that receives frequent organic traffic. WebPixie pairs uptime checks with retry logic to reduce false positives before alerting your team. For search visibility, combine uptime checks with the Indexability Checker and Link Crawler to catch blocked pages, broken links, and crawlability issues.

Yes. The Indexability Checker validates and analyzes text files such as robots.txt and llms.txt line by line, giving each file a valid or invalid verdict and flagging syntax it cannot parse. This lives in the TXT Files view, where you can see how a directive is interpreted instead of guessing whether a rule is written correctly. It also checks whether a sitemap is referenced in robots.txt, so a missing reference shows up before it costs you crawl coverage. On the Main Page view it reads the canonical URL and its alternates, which together indicate how search engines and AI crawlers are likely to treat the page. Indexability is one of the five sub-scores in your composite WebPixie Site Score, alongside header, SSL, DNS, and domain analysis. For link-level crawl health, pair it with the Link Crawler, and compare plan limits on the pricing page.

WebPixie alerts you when a monitored page becomes unreachable or fails the conditions you configured. No monitoring tool can guarantee that a search engine has not attempted to crawl during an outage, but fast detection helps reduce the duration and SEO risk of availability problems. Uptime monitoring checks for timeouts, connection failures, unexpected status codes, redirects, and missing expected content, then uses retry logic to reduce false positives before opening an incident. Alerts are sent by email on every plan, with Slack and webhooks available on eligible plans, so the right team can act quickly. Confirmed outages can be tracked through incident management, including severity, affected resource, and resolution timing. For search-specific risk, combine outage alerts with the Indexability Checker to catch crawl-blocking configuration problems.

They answer different questions. The Link Crawler is about link health: starting from your homepage it follows internal and external links and reports each one's status code, response time, redirect status, and whether it loads successfully. The Indexability Checker is about crawl eligibility: it validates text files such as robots.txt and llms.txt, reads canonical URLs and alternates on the Main Page view, and contributes a sub-score to your composite WebPixie Site Score. A link can be healthy yet blocked from search engines, and a page can be indexable yet link to broken destinations, so the two signals do not replace each other. The crawler deliberately ignores robots.txt while following links, because it must fetch a page to confirm it works, while indexability is where the search-eligibility verdict belongs. Many teams run both and compare plan limits on the pricing page.

Ready to see what engines can index?

Free plan, no credit card. Search and AI visibility in one check.