Skip to main content
    API Reference

    API Reference

    OpenAI-compatible REST API. One endpoint for chat completions with auth, PII protection, memory, and rate limiting built in.

    Base URL

    https://your-project.behest.app

    Each project gets a dedicated subdomain. Find yours in the Behest dashboard.

    Authentication

    All requests require a Bearer token in the Authorization header.

    Authorization: Bearer your-api-key

    API keys are generated per-project in the dashboard. Keys are hashed with Argon2id and cannot be retrieved after creation — store them securely.

    POST /v1/chat/completions

    Create a chat completion. OpenAI-compatible request and response format.

    Request Body

    ParameterTypeRequiredDescription
    modelstringYesThe model to use. Currently supported: gemini-2.5-flash, gemini-2.5-pro
    messagesarrayYesArray of message objects with role ("user", "assistant", "system") and content (string)
    streambooleanNoEnable server-sent events streaming. Default: false
    temperaturenumberNoSampling temperature (0-2). Higher values increase randomness. Default: 1.0
    max_tokensintegerNoMaximum tokens in the response. Defaults to model maximum.

    Custom Headers

    HeaderRequiredDescription
    AuthorizationYesBearer token: Bearer your-api-key
    Content-TypeYesMust be application/json
    X-End-User-IdNoUnique identifier for the end user. Enables per-user memory, rate limiting, token budgets, and analytics.

    Full Request Example

    const response = await fetch(
      "https://your-project.behest.app/v1/chat/completions",
      {
        method: "POST",
        headers: {
          "Authorization": "Bearer your-api-key",
          "Content-Type": "application/json",
          "X-End-User-Id": "user-12345",
        },
        body: JSON.stringify({
          model: "gemini-2.5-flash",
          messages: [
            { role: "user", content: "Summarize this contract" }
          ],
          temperature: 0.7,
          max_tokens: 1024,
        }),
      }
    );

    Response Format

    {
      "id": "chatcmpl-abc123",
      "object": "chat.completion",
      "created": 1709000000,
      "model": "gemini-2.5-flash",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Here is a summary of the contract..."
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 25,
        "completion_tokens": 150,
        "total_tokens": 175
      }
    }

    Rate Limit Headers

    Every response includes rate limit headers so your app can handle limits gracefully.

    HeaderDescription
    X-RateLimit-LimitMaximum requests per minute for this tier
    X-RateLimit-RemainingRequests remaining in the current window
    X-RateLimit-ResetUnix timestamp when the current window resets

    Error Codes

    CodeMeaningCommon Cause
    400Bad RequestMissing required fields, invalid model name, malformed JSON
    401UnauthorizedMissing or invalid API key, expired token
    403ForbiddenCORS origin not allowed, project kill switch active, prompt blocked by Sentinel
    429Too Many RequestsRate limit exceeded (per-IP, per-project, or per-user). Check X-RateLimit-Reset header for retry timing.
    503Service UnavailableUpstream LLM provider error or temporary service disruption. Retry with exponential backoff.