YouBot: You.com’s Web Crawler
YouBot is the web crawler that powers the You.com search engine. It automatically discovers and indexes web pages to provide real-time, accurate search results for You.com users.
Overview
YouBot is designed to crawl the web efficiently and respectfully, following industry standards and best practices. It respects robots.txt directives and crawl rate preferences to ensure minimal impact on web servers while maintaining comprehensive coverage of the web.
User Agent
YouBot identifies itself with the following user agent string:
Note: X.X.X.X represents the Google Chrome version number.
The user agent includes:
- Compatible identifier:
YouBot/1.0 - Contact email:
spider@you.com - Environment: Production (
env:prod)
Verifying YouBot
Since user agent strings can be spoofed, You.com supports three ways to verify that requests are genuinely from YouBot: cryptographic signatures (recommended), reverse DNS checks, and IP range validation.
Cloudflare Web Bot Auth
YouBot uses Cloudflare’s Web Bot Auth standard for authentication. This cryptographic verification ensures that requests claiming to be from YouBot are legitimate.
To verify YouBot requests:
- Check the HTTP Message Signatures in the request headers
- Retrieve the public keys from You.com’s well-known directory:
- Validate the signature using the public keys provided
The public keys are in JSON Web Key (JWK) format and use the Ed25519 cryptographic algorithm.
Reverse DNS Lookup
You can confirm crawler identity by resolving the client IP to a hostname and checking that it matches the expected pattern. For example, a reverse DNS lookup (dig -x) can return a crawler-specific hostname, and a forward lookup on that hostname should resolve back to the same IP.
For YouBot, hostnames use the form youbot-{octets-with-hyphens}.search.you.com, where {octets-with-hyphens} matches the connecting IP (for example, 68.67.112.106 becomes youbot-68-67-112-106).
IP Range Check
Legitimate YouBot requests originate from 68.67.112.0/24.
Crawl Rate and Server Load
YouBot is designed to crawl efficiently without overwhelming web servers. The crawler:
- Respects robots.txt directives
- Implements adaptive crawl rate limiting
- Distributes requests across multiple IP addresses
- Honors crawl-delay directives
If you notice excessive crawl activity from YouBot, please contact us at spider@you.com.
Controlling YouBot Access
Using robots.txt
You can control YouBot’s access to your site using the standard robots.txt file:
HTTP Status Codes
YouBot respects standard HTTP status codes:
- 200 OK: Page is crawled and indexed
- 301/302: Redirects are followed
- 404 Not Found: Page is removed from index
- 429 Too Many Requests: Crawl rate is reduced
- 503 Service Unavailable: Crawling is temporarily paused
Technical Properties
Supported Protocols
- HTTP/1.1
- HTTP/2
- HTTPS (TLS 1.2 and above)
Supported Content Types
YouBot crawls and indexes various content types including:
- HTML pages
- PDF documents
- Plain text files
- XML and RSS feeds
- Structured data (JSON-LD, microdata, RDFa)
Content Encodings
YouBot supports standard content encodings:
- gzip
- deflate
- Brotli (br)
Contact and Support
For questions, concerns, or issues related to YouBot’s crawling activity:
Email: spider@you.com
Common reasons to contact us:
- Reporting excessive crawl rates
- Requesting crawl adjustments
- Reporting technical issues
- Discussing custom crawl requirements for large sites
Frequently Asked Questions
Why is YouBot crawling my site?
YouBot crawls publicly accessible web pages to provide comprehensive, real-time search results for You.com users. If your content is public and not blocked by robots.txt, it may be crawled and indexed.
How often does YouBot crawl my site?
Crawl frequency depends on factors like:
- How often your content changes
- Your site’s popularity and authority
- Your server’s response time
- Any crawl-delay directives in robots.txt
Can I request a recrawl of my content?
For immediate indexing needs or custom crawl requests, please reach out to spider@you.com.
Does YouBot respect robots.txt?
Yes, YouBot fully respects robots.txt directives, including user-agent specific rules and crawl-delay settings.
How do I report a problem with YouBot?
Contact us at spider@you.com with details about the issue, including:
- Your domain name
- Timestamps of problematic requests
- Description of the issue
- Server logs (if applicable)