Two-Stage Architecture
At scale, you can't score every item for every request. The solution: funnel architecture.
**Stage 1: Candidate Generation**
- Goal: Retrieve 100-1000 relevant items from millions
- Methods: ANN search, inverted indexes, rule-based filters
- Latency budget: <20ms
- Multiple retrievers for coverage
**Stage 2: Ranking**
- Goal: Score and order candidates precisely
- Methods: Neural rankers with rich features
- Latency budget: <50ms
- More complex models since fewer items
**Stage 3: Re-ranking (optional)**
- Business rules, diversity, freshness
- Remove already-seen, out-of-stock
- Final slate assembly
**Example (YouTube-scale):**
- 100M+ videos in catalog
- Candidate generation: retrieve ~1000 videos
- Ranking: score with deep neural network
- Return top 20 for the user
