Retrieve page content
Live-crawl search results to get full HTML or Markdown page content. Ideal for RAG, knowledge base construction, and deep content analysis.
Overview
By default, search results include snippets — 100–200 words of extracted text per result. Enable live crawling to get the full page content: typically 2,000–10,000 words of clean HTML or Markdown per result.
This is what enables:
- Deep RAG with full document context
- Knowledge base construction from live web data
- Comprehensive content synthesis across sources
- Full article bodies for news results
How it works
Add livecrawl to any search request. The API fetches each matching result’s page in real time and attaches a contents object to it. You choose which result types to crawl and what format to return.
markdown is recommended for LLM use cases — it strips navigation, ads, and boilerplate HTML, leaving only the core content.
Crawl web results
Set livecrawl=web to attach full page content to web results. The contents.markdown (or contents.html) field is added to each result that was successfully crawled.
Crawl news results
Set livecrawl=news to get full article bodies for news results. Combine with freshness for breaking news pipelines.
Crawl both web and news
Use livecrawl=all to crawl every result type in one request.
Control crawl timeout
By default the crawler waits up to 10 seconds per page. For latency-sensitive applications, reduce crawl_timeout. For complex or slow-loading pages, increase it (up to 60 seconds).
HTML vs Markdown
Already have URLs?
If you have a list of URLs and don’t need to search first, use the Contents API directly. It accepts URLs without a query and returns the same markdown or html content.