***
title: Contents API Overview
'og:title': You.com Contents API | Clean HTML & Markdown from Any URL
'og:description': >-
Extract clean HTML or Markdown content from any webpage on demand. Perfect for
competitive intelligence, knowledge base ingestion, and web scraping without
the mess.
---------
## What is the Contents API?
The Contents API extracts clean HTML or Markdown content from any URL you provide. Pass it a list of URLs and get back the full page content, ready for LLM consumption—no parsing, no HTML noise, no browser automation required.
**TL;DR:** Give it URLs, get back clean content. Works on any publicly accessible webpage.
***
## How it's different from livecrawl
The Contents API and the `livecrawl` parameter in the Search API both extract full page content, but they serve different workflows:
| | Contents API | Search API + livecrawl |
| ------------------ | --------------------------- | --------------------------------------- |
| **Starting point** | You already know the URLs | You have a query, not URLs |
| **Use case** | Fetch known pages on demand | Enrich search results with full content |
| **URL source** | You provide them | You.com search discovers them |
| **Batch size** | 10 URLs per request | Up to 100 results per search |
Use the Contents API when you have a list of specific URLs you want to read. Use `livecrawl` when you want full content returned alongside search results.
***
## What you get
Each URL in your request returns a structured object:
```json
[
{
"url": "https://competitor.com/pricing",
"title": "Pricing — Competitor Inc.",
"markdown": "# Pricing\n\n## Starter Plan\n$49/month...",
"html": "...",
"metadata": {
"site_name": "Competitor Inc.",
"favicon_url": "https://ydc-index.io/favicon?domain=competitor.com&size=128"
}
}
]
```
You control which formats are returned via the `formats` parameter—request `markdown`, `html`, and/or `metadata` in any combination.
***
## Key features
### Any URL, on demand
Pass up to 10 URLs in a single request. The API crawls each one in parallel and returns the content. No need to manage a headless browser or deal with raw HTML yourself.
### LLM-ready Markdown
The `markdown` format strips navigation menus, ads, footers, and other boilerplate. You get actual content of the page—ready to drop into a prompt.
### Configurable timeout
Use `crawl_timeout` (1–60 seconds) to balance speed vs. completeness. For fast pages: 5–10 seconds. For heavy JavaScript-rendered pages: 20–30 seconds.
### Metadata extraction
Request `metadata` alongside content to get the page's site name and favicon URL—useful for building UIs that display source attribution.
***
## Quickstart
```python
import os
import requests
API_KEY = os.environ["YOU_API_KEY"]
response = requests.post(
"https://ydc-index.io/v1/contents",
headers={"X-API-Key": API_KEY},
json={
"urls": ["https://you.com/about"],
"formats": ["markdown"],
},
)
pages = response.json()
for page in pages:
print(f"Title: {page['title']}")
print(f"Content preview: {page['markdown'][:300]}\n")
```
```typescript
const response = await fetch("https://ydc-index.io/v1/contents", {
method: "POST",
headers: {
"X-API-Key": process.env.YOU_API_KEY!,
"Content-Type": "application/json",
},
body: JSON.stringify({
urls: ["https://you.com/about"],
formats: ["markdown"],
}),
});
const pages = await response.json();
for (const page of pages) {
console.log(`Title: ${page.title}`);
console.log(`Preview: ${page.markdown?.slice(0, 300)}\n`);
}
```
```curl
curl -X POST https://ydc-index.io/v1/contents \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://you.com/about"],
"formats": ["markdown"]
}'
```
***
## Parameters
| Parameter | Type | Required | Description |
| --------------- | ---------------- | -------- | ------------------------------------------------------------------------------- |
| `urls` | array of strings | Yes | The URLs to fetch content from |
| `formats` | array of strings | No | Content formats to return: `markdown`, `html`, `metadata` (default: `markdown`) |
| `crawl_timeout` | number | No | Per-URL timeout in seconds, between 1 and 60 (default: 10) |
[View full API reference](/api-reference/contents/contents)
***
## Common use cases
### Competitive intelligence
Monitor competitor pricing, feature, or blog pages. Fetch the content on a schedule, feed it to an LLM, and surface meaningful changes—without manual checking.
```python
import requests
API_KEY = "YOUR_API_KEY"
competitor_pages = [
"https://competitor-a.com/pricing",
"https://competitor-b.com/pricing",
"https://competitor-c.com/features",
]
response = requests.post(
"https://ydc-index.io/v1/contents",
headers={"X-API-Key": API_KEY},
json={
"urls": competitor_pages,
"formats": ["markdown"],
"crawl_timeout": 15,
},
)
pages = response.json()
for page in pages:
print(f"\n--- {page['title']} ---")
# Feed page['markdown'] into your LLM for summarization or diff
print(page["markdown"][:500])
```
### Knowledge base ingestion
You have a list of authoritative sources—documentation pages, whitepapers, internal wikis. Fetch them all, convert to clean Markdown, and index into your vector store.
```python
import requests
API_KEY = "YOUR_API_KEY"
# Known authoritative sources to index
source_urls = [
"https://docs.example.com/api-reference",
"https://docs.example.com/authentication",
"https://docs.example.com/rate-limits",
]
response = requests.post(
"https://ydc-index.io/v1/contents",
headers={"X-API-Key": API_KEY},
json={
"urls": source_urls,
"formats": ["markdown", "metadata"],
},
)
documents = []
for page in response.json():
documents.append({
"source": page["url"],
"title": page["title"],
"content": page["markdown"],
})
# Index document into your vector store here
```
### Research assistant
Give users the ability to ask questions about specific URLs. Fetch the page content on the fly and feed it as context into your LLM—turning any URL into a searchable document.
```python
import requests
API_KEY = "YOUR_API_KEY"
def fetch_url_context(url: str) -> str:
response = requests.post(
"https://ydc-index.io/v1/contents",
headers={"X-API-Key": API_KEY},
json={"urls": [url], "formats": ["markdown"]},
)
pages = response.json()
return pages[0]["markdown"] if pages else ""
# User asks: "Summarize this page for me"
url = "https://example.com/long-report"
context = fetch_url_context(url)
prompt = f"Summarize the following page content:\n\n{context}"
# Pass prompt to your LLM
```
***
## Best practices
### Request only the formats you need
Each format adds processing time. If you only need Markdown for LLM consumption, don't request `html`. If you don't need site metadata for your UI, skip `metadata`.
### Batch your URLs
A single request with 10 URLs is faster than 10 separate requests. The API processes them in parallel.
### Set `crawl_timeout` based on the target site
For simple static pages, 5–10 seconds is usually enough. For JavaScript-heavy pages (SPAs, dashboards), increase to 20–30 seconds to give the renderer time to complete.
### Handle partial failures gracefully
If one URL in a batch fails to crawl (e.g., it's behind a login wall or returns a 404), the API returns `null` for its `markdown` and `html` fields. Always check before processing:
```python
for page in response.json():
if page.get("markdown"):
# Process content
pass
else:
print(f"Failed to fetch: {page['url']}")
```
***
## Rate limits & pricing
Pricing is based on the number of URLs fetched per request. For detailed pricing information, visit [you.com/platform/upgrade](https://you.com/platform/upgrade) or contact [api@you.com](mailto:api@you.com).
***
## Next steps
Full parameter reference, request/response schemas, and error codes
Pair search results with livecrawl to get full content alongside real-time web data
Get your API key and make your first call in under five minutes
Use the official SDK for cleaner integration