Reference Architecture
- Client app -> API gateway -> orchestration service -> Claude API
- Add retrieval layer for grounded responses from internal data
- Add policy enforcement and audit logging before final response
Cost Model You Should Track
Treat integration cost in three buckets: implementation effort, runtime infra, and model usage.
Most teams underestimate orchestration and monitoring cost while over-focusing on token pricing.
Optimization Priorities
- Cache stable answers for repetitive workflows.
- Use smaller/cheaper models for low-risk tasks.
- Constrain output format to reduce retries and post-processing.
- Measure cost per resolved workflow, not only token totals.
