How to Evaluate You.com Search API
A practical guide to benchmarking You.com’s Search API: methodology, configurations, datasets, and real performance tradeoffs.
Why This Guide Exists
Most developer docs treat evaluation like checking boxes. This guide treats it like shipping production code: you need real benchmarks, honest tradeoffs, and configurations that actually work.
We’ll cover:
- Retrieval Quality - Does it actually find what you need?
- Latency - Fast enough for your users?
- Freshness - Can it handle “what happened today?”
- Cost - What’s your burn rate per query?
- Agent Performance - Does it work in multi-step reasoning workflows?
Want help running your eval? Our team can design and run custom benchmarks for your use case. Talk to us
The Golden Rule: Start Simple, Stay Fair
TL;DR: Use default settings. Don’t over-engineer your first eval.
Most failed evaluations have one thing in common: people add too many parameters too early.
Recommended Starting Point
When to Add Complexity
Add parameters ONLY when:
- Your evaluation explicitly tests that feature (e.g., freshness requires the
freshnessparameter) - You’ve already run baseline evals and know what you’re optimizing for
- The parameter reflects actual production usage, not hypothetical edge cases
Anti-pattern: “Let me add every possible parameter to make this perfect”
Better approach: “Let me run this with defaults, measure performance, then iterate”
API Parameters Reference
The Search API (GET https://ydc-index.io/v1/search) accepts these parameters:
Latency: Compare Apples to Apples
Critical insight: Never compare APIs with wildly different latency profiles.
A 200ms API and a 3000ms API serve different use cases. Comparing them is like comparing a bicycle to a freight train.
Latency Buckets
Fair Comparison Framework
Configuration Examples
Minimal Config (Start Here)
With Full Page Content (for RAG)
Freshness Config (Time-Sensitive Queries)
Raw HTTP Request
Evaluation Workflow: 4 Steps That Actually Work
1. Define What You’re Testing
Don’t start with “let’s evaluate everything.” Start with:
- What capability matters? (speed? accuracy? freshness?)
- What latency can you tolerate?
- Single-step retrieval or multi-step reasoning?
Example scope: “We need 90%+ accuracy on customer support questions with < 500ms latency”
2. Pick Your Dataset
Pro tip: Start with public benchmarks, but your production queries are the real test.
Need help building a custom dataset? We can help
3. Run Your Eval
4. Analyze & Iterate
Look at:
- Accuracy vs latency tradeoff - Can you get 95% accuracy at 300ms?
- Failure modes - Which queries fail? Is there a pattern?
- Cost - What’s your $/1000 queries?
Then iterate:
- Add
livecrawlif snippets aren’t giving enough context - Add
freshnessif failures are due to stale content - Compare against competitors in the same latency class
Response Structure
The API returns results in two sections:
Tool Calling for Agents
When evaluating You.com in agentic workflows, keep the tool definition minimal.
Open-source evaluation framework: Check out Agentic Web Search Playoffs for a ready-to-use benchmark comparing web search providers in agentic contexts.
Note: Don’t expose freshness, livecrawl, or other parameters to the agent unless necessary. Let the agent focus on formulating good queries.
Implementation
Common Mistakes to Avoid
1. Over-Filtering Too Early
Don’t:
Do:
2. Ignoring Your Actual Queries
Don’t just run: Public benchmarks
Also run: Your actual user queries from production logs
3. Not Measuring What Users Care About
Don’t only measure: Technical accuracy
Also measure: Click-through rate, task completion, reformulation rate
4. Testing in Isolation
Don’t test: Search API alone
Test: Full workflow (search -> synthesis -> grading) with your actual LLM and prompts
Debugging Performance Issues
If Accuracy is Low (< 85%)
- Are you requesting enough results? Try
count=15 - Enable livecrawl for full page content:
- Is your synthesis prompt good? Test with GPT-4
- Is your grading fair? Manual review a sample
If Results are Stale
Still stuck? Our team has run hundreds of search evals. Get hands-on help
Production Checklist
1. Run Comparative Benchmarks
2. Set Up Monitoring
3. Document Everything
Getting Help
- Evaluations as a Service - Custom benchmarks designed and run by our team
- Agentic Web Search Playoffs - Open-source benchmark for comparing web search in agentic workflows
- API Documentation
- Discord Community
- Email: developers@you.com
Remember: The best evaluation is the one you actually run. Start simple, measure what matters, and iterate.