*** title: Contents API Overview 'og:title': You.com Contents API | Clean HTML & Markdown from Any URL 'og:description': >- Extract clean HTML or Markdown content from any webpage on demand. Perfect for competitive intelligence, knowledge base ingestion, and web scraping without the mess. --------- ## What is the Contents API? The Contents API extracts clean HTML or Markdown content from any URL you provide. Pass it a list of URLs and get back the full page content, ready for LLM consumption—no parsing, no HTML noise, no browser automation required. **TL;DR:** Give it URLs, get back clean content. Works on any publicly accessible webpage. *** ## How it's different from livecrawl The Contents API and the `livecrawl` parameter in the Search API both extract full page content, but they serve different workflows: | | Contents API | Search API + livecrawl | | ------------------ | --------------------------- | --------------------------------------- | | **Starting point** | You already know the URLs | You have a query, not URLs | | **Use case** | Fetch known pages on demand | Enrich search results with full content | | **URL source** | You provide them | You.com search discovers them | | **Batch size** | 10 URLs per request | Up to 100 results per search | Use the Contents API when you have a list of specific URLs you want to read. Use `livecrawl` when you want full content returned alongside search results. *** ## What you get Each URL in your request returns a structured object: ```json [ { "url": "https://competitor.com/pricing", "title": "Pricing — Competitor Inc.", "markdown": "# Pricing\n\n## Starter Plan\n$49/month...", "html": "...", "metadata": { "site_name": "Competitor Inc.", "favicon_url": "https://ydc-index.io/favicon?domain=competitor.com&size=128" } } ] ``` You control which formats are returned via the `formats` parameter—request `markdown`, `html`, and/or `metadata` in any combination. *** ## Key features ### Any URL, on demand Pass up to 10 URLs in a single request. The API crawls each one in parallel and returns the content. No need to manage a headless browser or deal with raw HTML yourself. ### LLM-ready Markdown The `markdown` format strips navigation menus, ads, footers, and other boilerplate. You get actual content of the page—ready to drop into a prompt. ### Configurable timeout Use `crawl_timeout` (1–60 seconds) to balance speed vs. completeness. For fast pages: 5–10 seconds. For heavy JavaScript-rendered pages: 20–30 seconds. ### Metadata extraction Request `metadata` alongside content to get the page's site name and favicon URL—useful for building UIs that display source attribution. *** ## Quickstart ```python import os import requests API_KEY = os.environ["YOU_API_KEY"] response = requests.post( "https://ydc-index.io/v1/contents", headers={"X-API-Key": API_KEY}, json={ "urls": ["https://you.com/about"], "formats": ["markdown"], }, ) pages = response.json() for page in pages: print(f"Title: {page['title']}") print(f"Content preview: {page['markdown'][:300]}\n") ``` ```typescript const response = await fetch("https://ydc-index.io/v1/contents", { method: "POST", headers: { "X-API-Key": process.env.YOU_API_KEY!, "Content-Type": "application/json", }, body: JSON.stringify({ urls: ["https://you.com/about"], formats: ["markdown"], }), }); const pages = await response.json(); for (const page of pages) { console.log(`Title: ${page.title}`); console.log(`Preview: ${page.markdown?.slice(0, 300)}\n`); } ``` ```curl curl -X POST https://ydc-index.io/v1/contents \ -H "X-API-Key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://you.com/about"], "formats": ["markdown"] }' ``` *** ## Parameters | Parameter | Type | Required | Description | | --------------- | ---------------- | -------- | ------------------------------------------------------------------------------- | | `urls` | array of strings | Yes | The URLs to fetch content from | | `formats` | array of strings | No | Content formats to return: `markdown`, `html`, `metadata` (default: `markdown`) | | `crawl_timeout` | number | No | Per-URL timeout in seconds, between 1 and 60 (default: 10) | [View full API reference](/api-reference/contents/contents) *** ## Common use cases ### Competitive intelligence Monitor competitor pricing, feature, or blog pages. Fetch the content on a schedule, feed it to an LLM, and surface meaningful changes—without manual checking. ```python import requests API_KEY = "YOUR_API_KEY" competitor_pages = [ "https://competitor-a.com/pricing", "https://competitor-b.com/pricing", "https://competitor-c.com/features", ] response = requests.post( "https://ydc-index.io/v1/contents", headers={"X-API-Key": API_KEY}, json={ "urls": competitor_pages, "formats": ["markdown"], "crawl_timeout": 15, }, ) pages = response.json() for page in pages: print(f"\n--- {page['title']} ---") # Feed page['markdown'] into your LLM for summarization or diff print(page["markdown"][:500]) ``` ### Knowledge base ingestion You have a list of authoritative sources—documentation pages, whitepapers, internal wikis. Fetch them all, convert to clean Markdown, and index into your vector store. ```python import requests API_KEY = "YOUR_API_KEY" # Known authoritative sources to index source_urls = [ "https://docs.example.com/api-reference", "https://docs.example.com/authentication", "https://docs.example.com/rate-limits", ] response = requests.post( "https://ydc-index.io/v1/contents", headers={"X-API-Key": API_KEY}, json={ "urls": source_urls, "formats": ["markdown", "metadata"], }, ) documents = [] for page in response.json(): documents.append({ "source": page["url"], "title": page["title"], "content": page["markdown"], }) # Index document into your vector store here ``` ### Research assistant Give users the ability to ask questions about specific URLs. Fetch the page content on the fly and feed it as context into your LLM—turning any URL into a searchable document. ```python import requests API_KEY = "YOUR_API_KEY" def fetch_url_context(url: str) -> str: response = requests.post( "https://ydc-index.io/v1/contents", headers={"X-API-Key": API_KEY}, json={"urls": [url], "formats": ["markdown"]}, ) pages = response.json() return pages[0]["markdown"] if pages else "" # User asks: "Summarize this page for me" url = "https://example.com/long-report" context = fetch_url_context(url) prompt = f"Summarize the following page content:\n\n{context}" # Pass prompt to your LLM ``` *** ## Best practices ### Request only the formats you need Each format adds processing time. If you only need Markdown for LLM consumption, don't request `html`. If you don't need site metadata for your UI, skip `metadata`. ### Batch your URLs A single request with 10 URLs is faster than 10 separate requests. The API processes them in parallel. ### Set `crawl_timeout` based on the target site For simple static pages, 5–10 seconds is usually enough. For JavaScript-heavy pages (SPAs, dashboards), increase to 20–30 seconds to give the renderer time to complete. ### Handle partial failures gracefully If one URL in a batch fails to crawl (e.g., it's behind a login wall or returns a 404), the API returns `null` for its `markdown` and `html` fields. Always check before processing: ```python for page in response.json(): if page.get("markdown"): # Process content pass else: print(f"Failed to fetch: {page['url']}") ``` *** ## Rate limits & pricing Pricing is based on the number of URLs fetched per request. For detailed pricing information, visit [you.com/platform/upgrade](https://you.com/platform/upgrade) or contact [api@you.com](mailto:api@you.com). *** ## Next steps Full parameter reference, request/response schemas, and error codes Pair search results with livecrawl to get full content alongside real-time web data Get your API key and make your first call in under five minutes Use the official SDK for cleaner integration