Tiers and Usage
Behest models end-user entitlements as tiers. Every JWT carries a tier claim; Kong and LiteLLM enforce the limits associated with that tier. Usage is exposed per-user and per-project via GET /v1/billing/usage.
Built-in tiers
| Tier | Typical use | Default limits (configurable in dashboard) |
|---|---|---|
free | Demos, signed-out trials, generous free plan | 10 RPM, 100 req/day, 50k tokens/day |
pro | Paid users, normal product use | 60 RPM, 10k req/day, 2M tokens/day |
max | Power users, agents, enterprise seats | 300 RPM, 100k req/day, 20M tokens/day |
Limits are per-(project, uid) — two users on the same project do not share a quota. Defaults live in project_tiers.overrides (JSONB) and can be tuned in the dashboard without redeploying.
Assigning a tier
Tier is set at mint time on your backend:
import { Behest } from "@behest/client-ts";
const behest = new Behest(); // reads BEHEST_KEY from env
const { token, ttl } = await behest.auth.mint({
user_id: user.id,
tier: user.plan, // numeric tier id, e.g. 1 = free, 2 = pro, 3 = max
});In local-signing mode (behest_pk_* key), the tier rides along in the JWT claims automatically. The tier cannot be changed without re-minting — so a typical TTL of 5–15 min gives users a fast upgrade path.
After a user upgrades
- Update your DB (
users.plan = 2). - Mint a fresh JWT on the next request; the current short-lived token will expire on its own.
- No Behest-side action needed.
Reading usage
GET /v1/billing/usage
GET https://amber-fox-042.behest.app/v1/billing/usage?from=2026-04-01&to=2026-04-30
Authorization: Bearer <JWT>Returns platform + BYOK buckets separately:
{
"project_id": "663e...",
"window": { "from": "2026-04-01T00:00:00Z", "to": "2026-04-30T00:00:00Z" },
"platform": {
"requests": 12450,
"input_tokens": 3_200_000,
"output_tokens": 980_000,
"cost_usd": 12.47,
"by_model": [
{ "model": "gpt-4o-mini", "requests": 9800, "cost_usd": 4.2 },
{ "model": "claude-3-5-sonnet", "requests": 2650, "cost_usd": 8.27 }
]
},
"byok": {
"requests": 840,
"input_tokens": 120_000,
"output_tokens": 45_000,
"cost_usd": 0,
"by_provider": [{ "provider": "openai", "requests": 840 }]
},
"by_user": [
{ "user_id": "user_123", "requests": 420, "cost_usd": 1.14, "tier": "pro" }
]
}What /v1/billing/usage returns is determined by the role on your API key, not a value the caller picks. An admin-roled API key mints JWTs that see the whole project; a user-roled key (the default) mints JWTs whose response is filtered server-side to the authenticated uid. Role comes from the key record on the server: if you omit role in the mint body you get the key's role; if you pass one, the server only honors it when it matches the key's role, or when it's an explicit downgrade to "user" (e.g., an admin-roled service key minting a per-end-user JWT). Any other caller-supplied role — including escalation to "admin" from a user-roled key — is rejected. See authentication.md.
Per-user usage from your backend
Show a "12 / 100 today" meter:
const report = await behest.usage.get({
from: new Date(Date.now() - 24 * 60 * 60 * 1000),
to: new Date(),
});
// report.totals.tokens, report.breakdown[] — scoped to the JWT's uid when the key is user-roledHandling 402 (over quota)
When a user blows their daily cap, Behest returns:
HTTP/1.1 402 Payment Required
Content-Type: application/json
X-Trace-Id: 4b2c...
{
"error": {
"code": "quota_exceeded",
"message": "Daily token limit reached for tier 'free'.",
"details": {
"tier": "free",
"limit": { "tokens_per_day": 50000 },
"usage": { "tokens_today": 50123 }
}
}
}Client pattern (backend, using the SDK):
import { Behest, BehestQuotaError } from "@behest/client-ts";
try {
await behest.chat.completions.create({ messages });
} catch (err) {
if (err instanceof BehestQuotaError) {
const body = err.raw as {
error?: { details?: { tier?: string; limit?: unknown; usage?: unknown } };
};
showUpgradeModal({
tier: body?.error?.details?.tier,
usage: body?.error?.details?.usage,
limit: body?.error?.details?.limit,
});
return;
}
throw err;
}Client pattern (browser talking directly to Behest with a minted JWT):
const resp = await fetch(`https://${SLUG}.behest.app/v1/chat/completions`, {
method: "POST",
headers: {
Authorization: `Bearer ${token}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ messages }),
});
if (resp.status === 402) {
const body = await resp.json();
showUpgradeModal(body.error.details);
return;
}Do not retry 402. It is a deliberate deny. Retrying will not help; route the user to upgrade or wait.
Tier enforcement internals (for mental model)
Request arrives at Kong with JWT { uid, tier: "pro" }
↓ Kong resolves per-project overrides from Redis (tiers:{pid})
↓ Kong applies RPM limit → 429 if exceeded
↓ LiteLLM custom_auth checks daily buckets in Redis → 402 if exceeded
↓ Request proceeds
Both RPM (Kong) and daily token cap (LiteLLM) are configurable per tier per project in the dashboard → Tiers page.
Upgrade flow pattern
- User hits 402 → backend SDK throws
BehestQuotaError(or browser receives a 402 response). - Frontend shows modal with current vs next-tier limits + CTA.
- User upgrades through your billing provider (Stripe/Paddle/etc.).
- Your webhook updates
users.planin your DB. - Next token mint uses the new tier; current token expires within 15 min.
See Streaming UI for how to cancel an in-flight stream when a 402 happens mid-response.
See also
- Error handling — typed exceptions for 401/402/429/5xx
- Authentication — tier claim details
- Rate limiting — RPM and burst headers