***
title: Retrieve page content
subtitle: >-
Live-crawl search results to get full HTML or Markdown page content. Ideal for
RAG, knowledge base construction, and deep content analysis.
'og:title': Retrieve Page Content | You.com Search API
'og:description': >-
Live-crawl search results to get full HTML or Markdown page content. Ideal for
RAG, knowledge base construction, and deep content analysis.
------------------------------------------------------------
## Overview
By default, search results include snippets — 100–200 words of extracted text per result. Enable **live crawling** to get the full page content: typically 2,000–10,000 words of clean HTML or Markdown per result.
This is what enables:
* Deep RAG with full document context
* Knowledge base construction from live web data
* Comprehensive content synthesis across sources
* Full article bodies for news results
## How it works
Add `livecrawl` to any search request. The API fetches each matching result's page in real time and attaches a `contents` object to it. You choose which result types to crawl and what format to return.
| Parameter | Type | Options | Description |
| ------------------- | ------- | ----------------------- | ---------------------------- |
| `livecrawl` | string | `web`, `news`, `all` | Which result types to crawl |
| `livecrawl_formats` | string | `html`, `markdown` | Format for returned content |
| `crawl_timeout` | integer | `1`–`60` (default `10`) | Max seconds to wait per page |
`markdown` is recommended for LLM use cases — it strips navigation, ads, and boilerplate HTML, leaving only the core content.
## Crawl web results
Set `livecrawl=web` to attach full page content to web results. The `contents.markdown` (or `contents.html`) field is added to each result that was successfully crawled.
```python
from youdotcom import You
from youdotcom.models import LiveCrawl, LiveCrawlFormats
with You(api_key_auth="api_key") as you:
res = you.search.unified(
query="transformer architecture explained",
count=5,
livecrawl=LiveCrawl.WEB,
livecrawl_formats=LiveCrawlFormats.MARKDOWN,
)
if res.results and res.results.web:
for result in res.results.web:
print(f"{result.title}")
print(f" URL: {result.url}")
if result.contents:
print(f" Content ({len(result.contents.markdown)} chars)")
print(f" Preview: {result.contents.markdown[:200]}...\n")
else:
print(" (No content retrieved)\n")
```
```typescript
import { You } from "@youdotcom-oss/sdk";
import { LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models";
const you = new You({
apiKeyAuth: "api_key",
});
async function run() {
const result = await you.search({
query: "transformer architecture explained",
count: 5,
livecrawl: LiveCrawl.Web,
livecrawlFormats: LiveCrawlFormats.Markdown,
});
result.results?.web?.forEach((r) => {
console.log(r.title);
console.log(` URL: ${r.url}`);
if (r.contents?.markdown) {
console.log(` Content (${r.contents.markdown.length} chars)`);
console.log(` Preview: ${r.contents.markdown.slice(0, 200)}...\n`);
} else {
console.log(" (No content retrieved)\n");
}
});
}
run();
```
```curl
curl -G https://ydc-index.io/v1/search \
-H "X-API-Key: api_key" \
--data-urlencode "query=transformer architecture explained" \
-d count=5 \
-d livecrawl=web \
-d livecrawl_formats=markdown
```
## Crawl news results
Set `livecrawl=news` to get full article bodies for news results. Combine with `freshness` for breaking news pipelines.
```python
from youdotcom import You
from youdotcom.models import Freshness, LiveCrawl, LiveCrawlFormats
with You(api_key_auth="api_key") as you:
res = you.search.unified(
query="semiconductor supply chain",
freshness=Freshness.WEEK,
count=5,
livecrawl=LiveCrawl.NEWS,
livecrawl_formats=LiveCrawlFormats.MARKDOWN,
)
if res.results and res.results.news:
for article in res.results.news:
print(f"{article.title}")
if article.contents:
print(article.contents.markdown[:400])
print()
```
```typescript
import { You } from "@youdotcom-oss/sdk";
import { Freshness, LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models";
const you = new You({
apiKeyAuth: "api_key",
});
async function run() {
const result = await you.search({
query: "semiconductor supply chain",
freshness: Freshness.Week,
count: 5,
livecrawl: LiveCrawl.News,
livecrawlFormats: LiveCrawlFormats.Markdown,
});
result.results?.news?.forEach((article) => {
console.log(article.title);
if (article.contents?.markdown) {
console.log(article.contents.markdown.slice(0, 400));
}
console.log();
});
}
run();
```
```curl
curl -G https://ydc-index.io/v1/search \
-H "X-API-Key: api_key" \
--data-urlencode "query=semiconductor supply chain" \
-d freshness=week \
-d count=5 \
-d livecrawl=news \
-d livecrawl_formats=markdown
```
## Crawl both web and news
Use `livecrawl=all` to crawl every result type in one request.
```python
from youdotcom import You
from youdotcom.models import LiveCrawl, LiveCrawlFormats
with You(api_key_auth="api_key") as you:
res = you.search.unified(
query="quantum computing breakthroughs",
count=5,
livecrawl=LiveCrawl.ALL,
livecrawl_formats=LiveCrawlFormats.MARKDOWN,
)
if res.results:
for result in (res.results.web or []):
if result.contents:
print(f"[WEB] {result.title}: {len(result.contents.markdown)} chars")
for result in (res.results.news or []):
if result.contents:
print(f"[NEWS] {result.title}: {len(result.contents.markdown)} chars")
```
```typescript
import { You } from "@youdotcom-oss/sdk";
import { LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models";
const you = new You({
apiKeyAuth: "api_key",
});
async function run() {
const result = await you.search({
query: "quantum computing breakthroughs",
count: 5,
livecrawl: LiveCrawl.All,
livecrawlFormats: LiveCrawlFormats.Markdown,
});
result.results?.web?.forEach((r) => {
if (r.contents?.markdown) {
console.log(`[WEB] ${r.title}: ${r.contents.markdown.length} chars`);
}
});
result.results?.news?.forEach((r) => {
if (r.contents?.markdown) {
console.log(`[NEWS] ${r.title}: ${r.contents.markdown.length} chars`);
}
});
}
run();
```
```curl
curl -G https://ydc-index.io/v1/search \
-H "X-API-Key: api_key" \
--data-urlencode "query=quantum computing breakthroughs" \
-d count=5 \
-d livecrawl=all \
-d livecrawl_formats=markdown
```
## Control crawl timeout
By default the crawler waits up to 10 seconds per page. For latency-sensitive applications, reduce `crawl_timeout`. For complex or slow-loading pages, increase it (up to 60 seconds).
```python
from youdotcom import You
from youdotcom.models import LiveCrawl, LiveCrawlFormats
with You(api_key_auth="api_key") as you:
# Low-latency pipeline: only wait 3 seconds per page
res = you.search.unified(
query="latest Python releases",
count=5,
livecrawl=LiveCrawl.WEB,
livecrawl_formats=LiveCrawlFormats.MARKDOWN,
crawl_timeout=3,
)
if res.results and res.results.web:
for result in res.results.web:
status = "crawled" if result.contents else "skipped (timeout)"
print(f"{result.title} — {status}")
```
```typescript
import { You } from "@youdotcom-oss/sdk";
import { LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models";
const you = new You({
apiKeyAuth: "api_key",
});
async function run() {
// Low-latency pipeline: only wait 3 seconds per page
const result = await you.search({
query: "latest Python releases",
count: 5,
livecrawl: LiveCrawl.Web,
livecrawlFormats: LiveCrawlFormats.Markdown,
crawlTimeout: 3,
});
result.results?.web?.forEach((r) => {
const status = r.contents ? "crawled" : "skipped (timeout)";
console.log(`${r.title} — ${status}`);
});
}
run();
```
```curl
curl -G https://ydc-index.io/v1/search \
-H "X-API-Key: api_key" \
--data-urlencode "query=latest Python releases" \
-d count=5 \
-d livecrawl=web \
-d livecrawl_formats=markdown \
-d crawl_timeout=3
```
## HTML vs Markdown
| Format | Best for |
| ---------- | ----------------------------------------------------------- |
| `markdown` | LLM prompts, RAG, text analysis — clean, no boilerplate |
| `html` | Rendering, scraping structured data, preserving page layout |
## Already have URLs?
If you have a list of URLs and don't need to search first, use the [Contents API](/contents/overview) directly. It accepts URLs without a query and returns the same `markdown` or `html` content.
***
## Next steps
Fetch page content directly from URLs, no search query needed
Retrieve and filter real-time news results
View all parameters and response schemas
Back to Search API overview