Multi-Tenant Auth for AI Apps
Behest provides built-in multi-tenant authentication and isolation. Project-level API keys, per-user tokens, and three-tier rate limiting — all without writing auth code.
How Behest Auth Works
Behest uses a layered authentication model designed for multi-tenant AI applications. At the top level, each project gets its own API key. Within a project, individual end-users are identified by the X-End-User-Id header. This creates complete tenant isolation — one user's data, memory, and rate limits never affect another's.
The auth model has three layers:
- Project-level API key — authenticates your application. Each project has its own endpoint, API key, and configuration. Create projects in the Behest dashboard.
- User tokens — the
X-End-User-Idheader identifies individual end-users within a project. Behest uses this to scope memory, rate limits, and analytics per user. - Tenant isolation — each user's conversation memory, token budget, and rate limit counters are completely isolated. There is no cross-tenant data leakage.
Step 1: Create a Project
Projects are the top-level unit of organization in Behest. Each project gets its own subdomain (your-project.behest.app), API key, configuration, and isolated data. You can create multiple projects for different environments (development, staging, production) or different applications.
- Sign in to the Behest Dashboard
- Click “Create Project” and name your project
- Add your frontend origin to the CORS allowed origins list (e.g.,
http://localhost:3000) - Copy your project URL and API key
Step 2: Make Authenticated Requests
Every request to Behest requires a project API key in the Authorization header. Optionally, include the X-End-User-Id header to enable per-user features:
const BEHEST_URL = "https://your-project.behest.app/v1/chat/completions";
// Basic authenticated request (project-level auth)
const response = await fetch(BEHEST_URL, {
method: "POST",
headers: {
"Authorization": "Bearer your-project-api-key",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gemini-2.5-flash",
messages: [{ role: "user", content: "Hello" }],
}),
});
// With per-user identification (enables user-level isolation)
const userResponse = await fetch(BEHEST_URL, {
method: "POST",
headers: {
"Authorization": "Bearer your-project-api-key",
"Content-Type": "application/json",
"X-End-User-Id": "user-123", // Your app's user ID
},
body: JSON.stringify({
model: "gemini-2.5-flash",
messages: [{ role: "user", content: "Hello" }],
}),
});How Rate Limiting Ties into Auth
Behest enforces rate limiting at three levels, all tied to the authentication context. This means rate limits are automatically scoped — you do not need to implement any rate limiting logic yourself:
- IP-level — protects against abuse from individual IP addresses. Requests from the same IP are throttled if they exceed the configured threshold.
- Project-level — enforces total usage limits for your entire project. Prevents runaway costs from bugs or unexpected traffic spikes.
- User-level — when you pass the
X-End-User-Idheader, each user gets their own rate limit bucket. One heavy user cannot exhaust the limits for everyone else.
When a rate limit is hit, Behest returns a 429 Too Many Requests response with headers indicating when the client can retry:
const response = await fetch(BEHEST_URL, {
method: "POST",
headers: {
"Authorization": "Bearer your-api-key",
"Content-Type": "application/json",
"X-End-User-Id": currentUser.id,
},
body: JSON.stringify({
model: "gemini-2.5-flash",
messages: [{ role: "user", content: input }],
}),
});
if (response.status === 429) {
const retryAfter = response.headers.get("Retry-After");
console.log(`Rate limited. Retry after ${retryAfter} seconds.`);
// Show a user-friendly message
}Tenant Isolation in Practice
When you use the X-End-User-Id header, Behest automatically isolates the following per user:
- Conversation memory — each user's conversation history is stored separately. User A's context never appears in User B's responses.
- Token budgets — per-user token consumption is tracked independently. You can set limits to prevent any single user from consuming too many tokens.
- Rate limit counters — each user has their own rate limit window. A heavy user hitting their limit does not affect other users.
- Usage analytics — the dashboard shows per-user metrics: requests, tokens, cost, and latency.
This isolation is enforced at the infrastructure level — it is not just application logic. There is no way for one tenant's data to leak into another's, even under high load or error conditions.