*** title: 'YouBot: You.com''s Web Crawler' 'og:title': 'YouBot: You.com''s Web Crawler | Crawl Behavior & Controls' 'og:description': >- Learn how YouBot, You.com's web crawler, discovers and indexes the web. Covers the user agent string, robots.txt compliance, crawl rate controls, and how to verify requests. ---------------- YouBot is the web crawler that powers the You.com search engine. It automatically discovers and indexes web pages to provide real-time, accurate search results for You.com users. ## Overview YouBot is designed to crawl the web efficiently and respectfully, following industry standards and best practices. It respects robots.txt directives and crawl rate preferences to ensure minimal impact on web servers while maintaining comprehensive coverage of the web. ## User Agent YouBot identifies itself with the following user agent string: ``` Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; YouBot/1.0; +https://docs.you.com/youbot; env:prod) Chrome/X.X.X.X Safari/537.36 ``` **Note**: `X.X.X.X` represents the Google Chrome version number. The user agent includes: * **Compatible identifier**: `YouBot/1.0` * **Contact email**: `spider@you.com` * **Environment**: Production (`env:prod`) ## Verifying YouBot Since user agent strings can be spoofed, You.com supports three ways to verify that requests are genuinely from YouBot: cryptographic signatures (recommended), reverse DNS checks, and IP range validation. ### Cloudflare Web Bot Auth YouBot uses **Cloudflare's Web Bot Auth** standard for authentication. This cryptographic verification ensures that requests claiming to be from YouBot are legitimate. To verify YouBot requests: 1. **Check the HTTP Message Signatures** in the request headers 2. **Retrieve the public keys** from You.com's well-known directory: ``` https://you.com/.well-known/http-message-signatures-directory ``` 3. **Validate the signature** using the public keys provided The public keys are in JSON Web Key (JWK) format and use the Ed25519 cryptographic algorithm. ### Reverse DNS Lookup You can confirm crawler identity by resolving the client IP to a hostname and checking that it matches the expected pattern. For example, a reverse DNS lookup (`dig -x`) can return a crawler-specific hostname, and a forward lookup on that hostname should resolve back to the same IP. ``` $ dig -x 66.249.66.106 crawl-66-249-66-106.googlebot.com. $ host youbot-68-67-112-106.search.you.com 68.67.112.106 ``` For YouBot, hostnames use the form `youbot-{octets-with-hyphens}.search.you.com`, where `{octets-with-hyphens}` matches the connecting IP (for example, `68.67.112.106` becomes `youbot-68-67-112-106`). ### IP Range Check Legitimate YouBot requests originate from **68.67.112.0/24**. ## Crawl Rate and Server Load YouBot is designed to crawl efficiently without overwhelming web servers. The crawler: * Respects robots.txt directives * Implements adaptive crawl rate limiting * Distributes requests across multiple IP addresses * Honors crawl-delay directives If you notice excessive crawl activity from YouBot, please contact us at [spider@you.com](mailto:spider@you.com). ## Controlling YouBot Access ### Using robots.txt You can control YouBot's access to your site using the standard robots.txt file: ``` # Block YouBot from entire site User-agent: YouBot Disallow: / ``` ``` # Block YouBot from specific directories User-agent: YouBot Disallow: /private/ Disallow: /admin/ ``` ``` # Allow YouBot with crawl delay User-agent: YouBot Crawl-delay: 10 ``` ### HTTP Status Codes YouBot respects standard HTTP status codes: * **200 OK**: Page is crawled and indexed * **301/302**: Redirects are followed * **404 Not Found**: Page is removed from index * **429 Too Many Requests**: Crawl rate is reduced * **503 Service Unavailable**: Crawling is temporarily paused ## Technical Properties ### Supported Protocols * HTTP/1.1 * HTTP/2 * HTTPS (TLS 1.2 and above) ### Supported Content Types YouBot crawls and indexes various content types including: * HTML pages * PDF documents * Plain text files * XML and RSS feeds * Structured data (JSON-LD, microdata, RDFa) ### Content Encodings YouBot supports standard content encodings: * gzip * deflate * Brotli (br) ## Contact and Support For questions, concerns, or issues related to YouBot's crawling activity: **Email**: [spider@you.com](mailto:spider@you.com) Common reasons to contact us: * Reporting excessive crawl rates * Requesting crawl adjustments * Reporting technical issues * Discussing custom crawl requirements for large sites ## Frequently Asked Questions ### Why is YouBot crawling my site? YouBot crawls publicly accessible web pages to provide comprehensive, real-time search results for You.com users. If your content is public and not blocked by robots.txt, it may be crawled and indexed. ### How often does YouBot crawl my site? Crawl frequency depends on factors like: * How often your content changes * Your site's popularity and authority * Your server's response time * Any crawl-delay directives in robots.txt ### Can I request a recrawl of my content? For immediate indexing needs or custom crawl requests, please reach out to [spider@you.com](mailto:spider@you.com). ### Does YouBot respect robots.txt? Yes, YouBot fully respects robots.txt directives, including user-agent specific rules and crawl-delay settings. ### How do I report a problem with YouBot? Contact us at [spider@you.com](mailto:spider@you.com) with details about the issue, including: * Your domain name * Timestamps of problematic requests * Description of the issue * Server logs (if applicable)