*** title: Retrieve page content subtitle: >- Live-crawl search results to get full HTML or Markdown page content. Ideal for RAG, knowledge base construction, and deep content analysis. 'og:title': Retrieve Page Content | You.com Search API 'og:description': >- Live-crawl search results to get full HTML or Markdown page content. Ideal for RAG, knowledge base construction, and deep content analysis. ------------------------------------------------------------ ## Overview By default, search results include snippets — 100–200 words of extracted text per result. Enable **live crawling** to get the full page content: typically 2,000–10,000 words of clean HTML or Markdown per result. This is what enables: * Deep RAG with full document context * Knowledge base construction from live web data * Comprehensive content synthesis across sources * Full article bodies for news results ## How it works Add `livecrawl` to any search request. The API fetches each matching result's page in real time and attaches a `contents` object to it. You choose which result types to crawl and what format to return. | Parameter | Type | Options | Description | | ------------------- | ------- | ----------------------- | ---------------------------- | | `livecrawl` | string | `web`, `news`, `all` | Which result types to crawl | | `livecrawl_formats` | string | `html`, `markdown` | Format for returned content | | `crawl_timeout` | integer | `1`–`60` (default `10`) | Max seconds to wait per page | `markdown` is recommended for LLM use cases — it strips navigation, ads, and boilerplate HTML, leaving only the core content. ## Crawl web results Set `livecrawl=web` to attach full page content to web results. The `contents.markdown` (or `contents.html`) field is added to each result that was successfully crawled. ```python from youdotcom import You from youdotcom.models import LiveCrawl, LiveCrawlFormats with You(api_key_auth="api_key") as you: res = you.search.unified( query="transformer architecture explained", count=5, livecrawl=LiveCrawl.WEB, livecrawl_formats=LiveCrawlFormats.MARKDOWN, ) if res.results and res.results.web: for result in res.results.web: print(f"{result.title}") print(f" URL: {result.url}") if result.contents: print(f" Content ({len(result.contents.markdown)} chars)") print(f" Preview: {result.contents.markdown[:200]}...\n") else: print(" (No content retrieved)\n") ``` ```typescript import { You } from "@youdotcom-oss/sdk"; import { LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models"; const you = new You({ apiKeyAuth: "api_key", }); async function run() { const result = await you.search({ query: "transformer architecture explained", count: 5, livecrawl: LiveCrawl.Web, livecrawlFormats: LiveCrawlFormats.Markdown, }); result.results?.web?.forEach((r) => { console.log(r.title); console.log(` URL: ${r.url}`); if (r.contents?.markdown) { console.log(` Content (${r.contents.markdown.length} chars)`); console.log(` Preview: ${r.contents.markdown.slice(0, 200)}...\n`); } else { console.log(" (No content retrieved)\n"); } }); } run(); ``` ```curl curl -G https://ydc-index.io/v1/search \ -H "X-API-Key: api_key" \ --data-urlencode "query=transformer architecture explained" \ -d count=5 \ -d livecrawl=web \ -d livecrawl_formats=markdown ``` ## Crawl news results Set `livecrawl=news` to get full article bodies for news results. Combine with `freshness` for breaking news pipelines. ```python from youdotcom import You from youdotcom.models import Freshness, LiveCrawl, LiveCrawlFormats with You(api_key_auth="api_key") as you: res = you.search.unified( query="semiconductor supply chain", freshness=Freshness.WEEK, count=5, livecrawl=LiveCrawl.NEWS, livecrawl_formats=LiveCrawlFormats.MARKDOWN, ) if res.results and res.results.news: for article in res.results.news: print(f"{article.title}") if article.contents: print(article.contents.markdown[:400]) print() ``` ```typescript import { You } from "@youdotcom-oss/sdk"; import { Freshness, LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models"; const you = new You({ apiKeyAuth: "api_key", }); async function run() { const result = await you.search({ query: "semiconductor supply chain", freshness: Freshness.Week, count: 5, livecrawl: LiveCrawl.News, livecrawlFormats: LiveCrawlFormats.Markdown, }); result.results?.news?.forEach((article) => { console.log(article.title); if (article.contents?.markdown) { console.log(article.contents.markdown.slice(0, 400)); } console.log(); }); } run(); ``` ```curl curl -G https://ydc-index.io/v1/search \ -H "X-API-Key: api_key" \ --data-urlencode "query=semiconductor supply chain" \ -d freshness=week \ -d count=5 \ -d livecrawl=news \ -d livecrawl_formats=markdown ``` ## Crawl both web and news Use `livecrawl=all` to crawl every result type in one request. ```python from youdotcom import You from youdotcom.models import LiveCrawl, LiveCrawlFormats with You(api_key_auth="api_key") as you: res = you.search.unified( query="quantum computing breakthroughs", count=5, livecrawl=LiveCrawl.ALL, livecrawl_formats=LiveCrawlFormats.MARKDOWN, ) if res.results: for result in (res.results.web or []): if result.contents: print(f"[WEB] {result.title}: {len(result.contents.markdown)} chars") for result in (res.results.news or []): if result.contents: print(f"[NEWS] {result.title}: {len(result.contents.markdown)} chars") ``` ```typescript import { You } from "@youdotcom-oss/sdk"; import { LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models"; const you = new You({ apiKeyAuth: "api_key", }); async function run() { const result = await you.search({ query: "quantum computing breakthroughs", count: 5, livecrawl: LiveCrawl.All, livecrawlFormats: LiveCrawlFormats.Markdown, }); result.results?.web?.forEach((r) => { if (r.contents?.markdown) { console.log(`[WEB] ${r.title}: ${r.contents.markdown.length} chars`); } }); result.results?.news?.forEach((r) => { if (r.contents?.markdown) { console.log(`[NEWS] ${r.title}: ${r.contents.markdown.length} chars`); } }); } run(); ``` ```curl curl -G https://ydc-index.io/v1/search \ -H "X-API-Key: api_key" \ --data-urlencode "query=quantum computing breakthroughs" \ -d count=5 \ -d livecrawl=all \ -d livecrawl_formats=markdown ``` ## Control crawl timeout By default the crawler waits up to 10 seconds per page. For latency-sensitive applications, reduce `crawl_timeout`. For complex or slow-loading pages, increase it (up to 60 seconds). ```python from youdotcom import You from youdotcom.models import LiveCrawl, LiveCrawlFormats with You(api_key_auth="api_key") as you: # Low-latency pipeline: only wait 3 seconds per page res = you.search.unified( query="latest Python releases", count=5, livecrawl=LiveCrawl.WEB, livecrawl_formats=LiveCrawlFormats.MARKDOWN, crawl_timeout=3, ) if res.results and res.results.web: for result in res.results.web: status = "crawled" if result.contents else "skipped (timeout)" print(f"{result.title} — {status}") ``` ```typescript import { You } from "@youdotcom-oss/sdk"; import { LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models"; const you = new You({ apiKeyAuth: "api_key", }); async function run() { // Low-latency pipeline: only wait 3 seconds per page const result = await you.search({ query: "latest Python releases", count: 5, livecrawl: LiveCrawl.Web, livecrawlFormats: LiveCrawlFormats.Markdown, crawlTimeout: 3, }); result.results?.web?.forEach((r) => { const status = r.contents ? "crawled" : "skipped (timeout)"; console.log(`${r.title} — ${status}`); }); } run(); ``` ```curl curl -G https://ydc-index.io/v1/search \ -H "X-API-Key: api_key" \ --data-urlencode "query=latest Python releases" \ -d count=5 \ -d livecrawl=web \ -d livecrawl_formats=markdown \ -d crawl_timeout=3 ``` ## HTML vs Markdown | Format | Best for | | ---------- | ----------------------------------------------------------- | | `markdown` | LLM prompts, RAG, text analysis — clean, no boilerplate | | `html` | Rendering, scraping structured data, preserving page layout | ## Already have URLs? If you have a list of URLs and don't need to search first, use the [Contents API](/contents/overview) directly. It accepts URLs without a query and returns the same `markdown` or `html` content. *** ## Next steps Fetch page content directly from URLs, no search query needed Retrieve and filter real-time news results View all parameters and response schemas Back to Search API overview