Documentation

Performance Profiles

economy

Best for: batch jobs and non-latency-sensitive workloads.

Routing: lowest normalized cost wins among eligible providers.

Typical winner: DeepInfra at roughly $0.07/1M tokens on many open models.

Expected latency: ~200–400 ms class, workload dependent.

balanced (default)

Best for: most production traffic.

Routing: balances cost, latency, and reliability via yield policy.

Typical winner: Featherless or Together AI.

Expected latency: ~150–300 ms.

fast

Best for: real-time, user-facing applications.

Routing: latency-first; Groq LPU tier is preferred when it quotes.

Typical winner: Groq at ~40 ms quotes for supported models.

Expected latency: ~40–200 ms.

Note: may incur slightly higher customer rates when the winning quote is a premium speed tier.

Comparison

ProfileAvg latencyTypical costBest for
economy~280 ms~$0.07/1MBatch, async
balanced~200 ms~$0.12/1MGeneral use
fast~80 ms~$0.18/1MReal-time

Usage example

Run the same prompt under all three profiles:

import requests

def run(profile: str):
    r = requests.post(
        0,
        headers={1: 2},
        json={
            3: 4,
            5: 6,
            7: 64,
            8: profile,
        },
    )
    r.raise_for_status()
    return r.json()

for p in (9, 10, 11):
    data = run(p)
    print(p, data[12][13], data[14][15])