Guide

Multi-Tenant Auth for AI Apps

Behest provides built-in multi-tenant authentication and isolation. Project-level API keys, per-user tokens, and three-tier rate limiting — all without writing auth code.

How Behest Auth Works

Behest uses a layered authentication model designed for multi-tenant AI applications. At the top level, each project gets its own API key. Within a project, individual end-users are identified by the X-End-User-Id header. This creates complete tenant isolation — one user's data, memory, and rate limits never affect another's.

The auth model has three layers:

Project-level API key — authenticates your application. Each project has its own endpoint, API key, and configuration. Create projects in the Behest dashboard.
User tokens — the X-End-User-Id header identifies individual end-users within a project. Behest uses this to scope memory, rate limits, and analytics per user.
Tenant isolation — each user's conversation memory, token budget, and rate limit counters are completely isolated. There is no cross-tenant data leakage.

Step 1: Create a Project

Projects are the top-level unit of organization in Behest. Each project gets its own subdomain (your-project.behest.app), API key, configuration, and isolated data. You can create multiple projects for different environments (development, staging, production) or different applications.

Sign in to the Behest Dashboard
Click “Create Project” and name your project
Add your frontend origin to the CORS allowed origins list (e.g., http://localhost:3000)
Copy your project URL and API key

Step 2: Make Authenticated Requests

Every request to Behest requires a project API key in the Authorization header. Optionally, include the X-End-User-Id header to enable per-user features:

const BEHEST_URL = "https://your-project.behest.app/v1/chat/completions";

// Basic authenticated request (project-level auth)
const response = await fetch(BEHEST_URL, {
  method: "POST",
  headers: {
    "Authorization": "Bearer your-project-api-key",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gemini-2.5-flash",
    messages: [{ role: "user", content: "Hello" }],
  }),
});

// With per-user and per-session identification
// (enables user-level isolation + per-session cost attribution)
const userResponse = await fetch(BEHEST_URL, {
  method: "POST",
  headers: {
    "Authorization": "Bearer your-project-api-key",
    "Content-Type": "application/json",
    "X-End-User-Id": "user-123", // Your app's user ID
    // Uniquely identifies a conversation thread for per-session cost attribution.
    "X-Session-Id": "user-123-conv-abc",
  },
  body: JSON.stringify({
    model: "gemini-2.5-flash",
    messages: [{ role: "user", content: "Hello" }],
  }),
});

How Rate Limiting Ties into Auth

Behest enforces rate limiting at three levels, all tied to the authentication context. This means rate limits are automatically scoped — you do not need to implement any rate limiting logic yourself:

IP-level — protects against abuse from individual IP addresses. Requests from the same IP are throttled if they exceed the configured threshold.
Project-level — enforces total usage limits for your entire project. Prevents runaway costs from bugs or unexpected traffic spikes.
User-level — when you pass the X-End-User-Id header, each user gets their own rate limit bucket. One heavy user cannot exhaust the limits for everyone else.

When a rate limit is hit, Behest returns a 429 Too Many Requests response with headers indicating when the client can retry:

const response = await fetch(BEHEST_URL, {
  method: "POST",
  headers: {
    "Authorization": "Bearer your-api-key",
    "Content-Type": "application/json",
    "X-End-User-Id": currentUser.id,
    // Uniquely identifies a conversation thread for per-session cost attribution.
    "X-Session-Id": `user-${currentUser.id}-conv-${conversationId}`,
  },
  body: JSON.stringify({
    model: "gemini-2.5-flash",
    messages: [{ role: "user", content: input }],
  }),
});

if (response.status === 429) {
  const retryAfter = response.headers.get("Retry-After");
  console.log(`Rate limited. Retry after ${retryAfter} seconds.`);
  // Show a user-friendly message
}

Tenant Isolation in Practice

When you use the X-End-User-Id header, Behest automatically isolates the following per user:

Conversation memory — each user's conversation history is stored separately. User A's context never appears in User B's responses.
Token budgets — per-user token consumption is tracked independently. You can set limits to prevent any single user from consuming too many tokens.
Rate limit counters — each user has their own rate limit window. A heavy user hitting their limit does not affect other users.
Usage analytics — the dashboard shows per-user metrics: requests, tokens, cost, and latency.

This isolation is enforced at the infrastructure level — it is not just application logic. There is no way for one tenant's data to leak into another's, even under high load or error conditions.

Quickstart API Reference PII Protection