Skip to main content
    Guide

    Multi-Tenant Auth for AI Apps

    Behest provides built-in multi-tenant authentication and isolation. Project-level API keys, per-user tokens, and three-tier rate limiting — all without writing auth code.

    How Behest Auth Works

    Behest uses a layered authentication model designed for multi-tenant AI applications. At the top level, each project gets its own API key. Within a project, individual end-users are identified by the X-End-User-Id header. This creates complete tenant isolation — one user's data, memory, and rate limits never affect another's.

    The auth model has three layers:

    • Project-level API key — authenticates your application. Each project has its own endpoint, API key, and configuration. Create projects in the Behest dashboard.
    • User tokens — the X-End-User-Id header identifies individual end-users within a project. Behest uses this to scope memory, rate limits, and analytics per user.
    • Tenant isolation — each user's conversation memory, token budget, and rate limit counters are completely isolated. There is no cross-tenant data leakage.

    Step 1: Create a Project

    Projects are the top-level unit of organization in Behest. Each project gets its own subdomain (your-project.behest.app), API key, configuration, and isolated data. You can create multiple projects for different environments (development, staging, production) or different applications.

    1. Sign in to the Behest Dashboard
    2. Click “Create Project” and name your project
    3. Add your frontend origin to the CORS allowed origins list (e.g., http://localhost:3000)
    4. Copy your project URL and API key

    Step 2: Make Authenticated Requests

    Every request to Behest requires a project API key in the Authorization header. Optionally, include the X-End-User-Id header to enable per-user features:

    const BEHEST_URL = "https://your-project.behest.app/v1/chat/completions";
    
    // Basic authenticated request (project-level auth)
    const response = await fetch(BEHEST_URL, {
      method: "POST",
      headers: {
        "Authorization": "Bearer your-project-api-key",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        model: "gemini-2.5-flash",
        messages: [{ role: "user", content: "Hello" }],
      }),
    });
    
    // With per-user identification (enables user-level isolation)
    const userResponse = await fetch(BEHEST_URL, {
      method: "POST",
      headers: {
        "Authorization": "Bearer your-project-api-key",
        "Content-Type": "application/json",
        "X-End-User-Id": "user-123", // Your app's user ID
      },
      body: JSON.stringify({
        model: "gemini-2.5-flash",
        messages: [{ role: "user", content: "Hello" }],
      }),
    });

    How Rate Limiting Ties into Auth

    Behest enforces rate limiting at three levels, all tied to the authentication context. This means rate limits are automatically scoped — you do not need to implement any rate limiting logic yourself:

    • IP-level — protects against abuse from individual IP addresses. Requests from the same IP are throttled if they exceed the configured threshold.
    • Project-level — enforces total usage limits for your entire project. Prevents runaway costs from bugs or unexpected traffic spikes.
    • User-level — when you pass the X-End-User-Id header, each user gets their own rate limit bucket. One heavy user cannot exhaust the limits for everyone else.

    When a rate limit is hit, Behest returns a 429 Too Many Requests response with headers indicating when the client can retry:

    const response = await fetch(BEHEST_URL, {
      method: "POST",
      headers: {
        "Authorization": "Bearer your-api-key",
        "Content-Type": "application/json",
        "X-End-User-Id": currentUser.id,
      },
      body: JSON.stringify({
        model: "gemini-2.5-flash",
        messages: [{ role: "user", content: input }],
      }),
    });
    
    if (response.status === 429) {
      const retryAfter = response.headers.get("Retry-After");
      console.log(`Rate limited. Retry after ${retryAfter} seconds.`);
      // Show a user-friendly message
    }

    Tenant Isolation in Practice

    When you use the X-End-User-Id header, Behest automatically isolates the following per user:

    • Conversation memory — each user's conversation history is stored separately. User A's context never appears in User B's responses.
    • Token budgets — per-user token consumption is tracked independently. You can set limits to prevent any single user from consuming too many tokens.
    • Rate limit counters — each user has their own rate limit window. A heavy user hitting their limit does not affect other users.
    • Usage analytics — the dashboard shows per-user metrics: requests, tokens, cost, and latency.

    This isolation is enforced at the infrastructure level — it is not just application logic. There is no way for one tenant's data to leak into another's, even under high load or error conditions.