Performance Profiles

economy

Best for: batch jobs and non-latency-sensitive workloads.

Routing: lowest normalized cost wins among eligible providers.

Typical winner: DeepInfra at roughly $0.07/1M tokens on many open models.

Expected latency: ~200–400 ms class, workload dependent.

balanced (default)

Best for: most production traffic.

Routing: balances cost, latency, and reliability via yield policy.

Typical winner: Featherless or Together AI.

Expected latency: ~150–300 ms.

fast

Best for: real-time, user-facing applications.

Routing: latency-first; Groq LPU tier is preferred when it quotes.

Typical winner: Groq at ~40 ms quotes for supported models.

Expected latency: ~40–200 ms.

Note: may incur slightly higher customer rates when the winning quote is a premium speed tier.

Comparison

Profile	Avg latency	Typical cost	Best for
economy	~280 ms	~$0.07/1M	Batch, async
balanced	~200 ms	~$0.12/1M	General use
fast	~80 ms	~$0.18/1M	Real-time

Usage example

Run the same prompt under all three profiles:

import requests

def run(profile: str):
    r = requests.post(
        "https://api.flopex.ai/v1/inference",
        headers={"Authorization": "Bearer sk_live_YOUR_KEY"},
        json={
            "model": "llama-3.1-8b",
            "input": "Hello",
            "max_output_tokens": 64,
            "performance_profile": profile,
        },
    )
    r.raise_for_status()
    return r.json()

for p in ("economy", "balanced", "fast"):
    data = run(p)
    print(p, data["performance"]["latency_ms"], data["billing"]["cost_display"])