Contents API Overview | YDC

What is the Contents API?

The Contents API extracts clean HTML or Markdown content from a given URL. Pass it a list of URLs and get back the full page content for each, ready for LLM consumption—no parsing, no HTML noise, no browser automation required.

How it’s different from livecrawl

The Contents API and the livecrawl parameter in the Search API both extract full page content, but they serve different workflows:

	Contents API	Search API + livecrawl
Starting point	You already know the URLs	You have a query, not URLs
Use case	Fetch known pages on demand	Enrich search results with full content
URL source	You provide them	You.com search discovers them
Batch size	10 URLs per request	Up to 100 results per search

Use the Contents API when you have a list of specific URLs you want to read. Use livecrawl when you want full content returned alongside search results.

What you get

Each URL in your request returns a structured object:

1 [
2   {
3     "url": "https://competitor.com/pricing",
4     "title": "Pricing — Competitor Inc.",
5     "markdown": "# Pricing\n\n## Starter Plan\n$49/month...",
6     "html": "<html>...</html>",
7     "metadata": {
8       "site_name": "Competitor Inc.",
9       "favicon_url": "https://ydc-index.io/favicon?domain=competitor.com&size=128"
10     }
11   }
12 ]

You control which formats are returned via the formats parameter—request markdown, html, and/or metadata in any combination.

Key features

Any URL, on demand

Pass up to 10 URLs in a single request. The API crawls them all in parallel and returns the content. No need to manage a headless browser or deal with raw HTML yourself.

LLM-ready Markdown

The markdown format strips navigation menus, ads, footers, and other boilerplate. You get actual content of the page—ready to drop into a prompt.

Configurable timeout

Use crawl_timeout (1–60 seconds) to balance speed vs. completeness. For fast pages: 5–10 seconds. For heavy JavaScript-rendered pages: 20–30 seconds.

Metadata extraction

Request metadata alongside content to get the page’s site name and favicon URL—useful for building UIs that display source attribution.

Quickstart

1 import os
2 from youdotcom import You
3 from youdotcom.models import ContentsFormats
4 
5 with You(api_key_auth="api_key") as you:
6     pages = you.contents.generate(
7         urls=["https://you.com/about"],
8         formats=[ContentsFormats.MARKDOWN],
9     )
10 
11     for page in pages:
12         print(f"Title: {page.title}")
13         print(f"Content preview: {page.markdown[:300]}\n")

Parameters

Parameter	Type	Required	Description
`urls`	array of strings	Yes	The URLs to fetch content from
`formats`	array of strings	No	Content formats to return: `markdown`, `html`, `metadata` (default: `markdown`)
`crawl_timeout`	number	No	Per-URL timeout in seconds, between 1 and 60 (default: 10)

View full API reference

Common use cases

Competitive intelligence

Monitor competitor pricing, feature, or blog pages. Fetch the content on a schedule, feed it to an LLM, and surface meaningful changes—without manual checking.

1 from youdotcom import You
2 from youdotcom.models import ContentsFormats
3 
4 competitor_pages = [
5     "https://competitor-a.com/pricing",
6     "https://competitor-b.com/pricing",
7     "https://competitor-c.com/features",
8 ]
9 
10 with You(api_key_auth="api_key") as you:
11     pages = you.contents.generate(
12         urls=competitor_pages,
13         formats=[ContentsFormats.MARKDOWN],
14         crawl_timeout=15,
15     )
16 
17     for page in pages:
18         print(f"\n--- {page.title} ---")
19         # Feed page.markdown into your LLM for summarization or diff
20         print(page.markdown[:500])

Knowledge base ingestion

You have a list of authoritative sources—documentation pages, whitepapers, internal wikis. Fetch them all, convert to clean Markdown, and index into your vector store.

1 from youdotcom import You
2 from youdotcom.models import ContentsFormats
3 
4 # Known authoritative sources to index
5 source_urls = [
6     "https://docs.example.com/api-reference",
7     "https://docs.example.com/authentication",
8     "https://docs.example.com/rate-limits",
9 ]
10 
11 with You(api_key_auth="api_key") as you:
12     pages = you.contents.generate(
13         urls=source_urls,
14         formats=[ContentsFormats.MARKDOWN, ContentsFormats.METADATA],
15     )
16 
17     documents = []
18     for page in pages:
19         documents.append({
20             "source": page.url,
21             "title": page.title,
22             "content": page.markdown,
23         })
24         # Index document into your vector store here

Research assistant

Give users the ability to ask questions about specific URLs. Fetch the page content on the fly and feed it as context into your LLM—turning any URL into a searchable document.

1 from youdotcom import You
2 from youdotcom.models import ContentsFormats
3 
4 def fetch_url_context(url: str) -> str:
5     with You(api_key_auth="api_key") as you:
6         pages = you.contents.generate(urls=[url], formats=[ContentsFormats.MARKDOWN])
7         return pages[0].markdown if pages else ""
8 
9 # User asks: "Summarize this page for me"
10 url = "https://example.com/long-report"
11 context = fetch_url_context(url)
12 
13 prompt = f"Summarize the following page content:\n\n{context}"
14 # Pass prompt to your LLM

Best practices

Request only the formats you need

Each format adds processing time. If you only need Markdown for LLM consumption, don’t request html. If you don’t need site metadata for your UI, skip metadata.

Batch your URLs

A single request with 10 URLs is faster than 10 separate requests. The API processes them in parallel.

Set `crawl_timeout` based on the target site

For simple static pages, 5–10 seconds is usually enough. For JavaScript-heavy pages (SPAs, dashboards), increase to 20–30 seconds to give the renderer time to complete.

Handle partial failures gracefully

If one URL in a batch fails to crawl (e.g., it’s behind a login wall or returns a 404), the API returns null for its markdown and html fields. Always check before processing:

1 for page in pages:
2     if page.markdown:
3         # Process content
4         pass
5     else:
6         print(f"Failed to fetch: {page.url}")

Rate limits & pricing

Pricing is based on the number of URLs fetched per request. See you.com/pricing or contact api@you.com.

Next steps

API Reference

Full parameter reference, request/response schemas, and error codes

Search API Overview

Pair search results with livecrawl to get full content alongside real-time web data

Quickstart

Get your API key and make your first call in under five minutes

Python SDK

Use the official SDK for cleaner integration

What is the Contents API?

How it’s different from livecrawl

What you get

Key features

Any URL, on demand

LLM-ready Markdown

Configurable timeout

Metadata extraction

Quickstart

Parameters

Common use cases

Competitive intelligence

Knowledge base ingestion

Research assistant

Best practices

Request only the formats you need

Batch your URLs

Set crawl_timeout based on the target site

Handle partial failures gracefully

Rate limits & pricing

Next steps

Set `crawl_timeout` based on the target site