Contents API Overview
What is the Contents API?
The Contents API extracts clean HTML or Markdown content from any URL you provide. Pass it a list of URLs and get back the full page content, ready for LLM consumption—no parsing, no HTML noise, no browser automation required.
TL;DR: Give it URLs, get back clean content. Works on any publicly accessible webpage.
How it’s different from livecrawl
The Contents API and the livecrawl parameter in the Search API both extract full page content, but they serve different workflows:
Use the Contents API when you have a list of specific URLs you want to read. Use livecrawl when you want full content returned alongside search results.
What you get
Each URL in your request returns a structured object:
You control which formats are returned via the formats parameter—request markdown, html, and/or metadata in any combination.
Key features
Any URL, on demand
Pass up to 10 URLs in a single request. The API crawls each one in parallel and returns the content. No need to manage a headless browser or deal with raw HTML yourself.
LLM-ready Markdown
The markdown format strips navigation menus, ads, footers, and other boilerplate. You get actual content of the page—ready to drop into a prompt.
Configurable timeout
Use crawl_timeout (1–60 seconds) to balance speed vs. completeness. For fast pages: 5–10 seconds. For heavy JavaScript-rendered pages: 20–30 seconds.
Metadata extraction
Request metadata alongside content to get the page’s site name and favicon URL—useful for building UIs that display source attribution.
Quickstart
Parameters
Common use cases
Competitive intelligence
Monitor competitor pricing, feature, or blog pages. Fetch the content on a schedule, feed it to an LLM, and surface meaningful changes—without manual checking.
Knowledge base ingestion
You have a list of authoritative sources—documentation pages, whitepapers, internal wikis. Fetch them all, convert to clean Markdown, and index into your vector store.
Research assistant
Give users the ability to ask questions about specific URLs. Fetch the page content on the fly and feed it as context into your LLM—turning any URL into a searchable document.
Best practices
Request only the formats you need
Each format adds processing time. If you only need Markdown for LLM consumption, don’t request html. If you don’t need site metadata for your UI, skip metadata.
Batch your URLs
A single request with 10 URLs is faster than 10 separate requests. The API processes them in parallel.
Set crawl_timeout based on the target site
For simple static pages, 5–10 seconds is usually enough. For JavaScript-heavy pages (SPAs, dashboards), increase to 20–30 seconds to give the renderer time to complete.
Handle partial failures gracefully
If one URL in a batch fails to crawl (e.g., it’s behind a login wall or returns a 404), the API returns null for its markdown and html fields. Always check before processing:
Rate limits & pricing
Pricing is based on the number of URLs fetched per request. For detailed pricing information, visit you.com/platform/upgrade or contact api@you.com.
Next steps
Full parameter reference, request/response schemas, and error codes
Pair search results with livecrawl to get full content alongside real-time web data
Get your API key and make your first call in under five minutes
Use the official SDK for cleaner integration