Performance Profiles
economy
Best for: batch jobs and non-latency-sensitive workloads.
Routing: lowest normalized cost wins among eligible providers.
Typical winner: DeepInfra at roughly $0.07/1M tokens on many open models.
Expected latency: ~200–400 ms class, workload dependent.
balanced (default)
Best for: most production traffic.
Routing: balances cost, latency, and reliability via yield policy.
Typical winner: Featherless or Together AI.
Expected latency: ~150–300 ms.
fast
Best for: real-time, user-facing applications.
Routing: latency-first; Groq LPU tier is preferred when it quotes.
Typical winner: Groq at ~40 ms quotes for supported models.
Expected latency: ~40–200 ms.
Note: may incur slightly higher customer rates when the winning quote is a premium speed tier.
Comparison
| Profile | Avg latency | Typical cost | Best for |
|---|---|---|---|
| economy | ~280 ms | ~$0.07/1M | Batch, async |
| balanced | ~200 ms | ~$0.12/1M | General use |
| fast | ~80 ms | ~$0.18/1M | Real-time |
Usage example
Run the same prompt under all three profiles:
import requests def run(profile: str): r = requests.post( 0