API Reference
API Reference
OpenAI-compatible REST API. One endpoint for chat completions with auth, PII protection, memory, and rate limiting built in.
Base URL
https://your-project.behest.appEach project gets a dedicated subdomain. Find yours in the Behest dashboard.
Authentication
All requests require a Bearer token in the Authorization header.
Authorization: Bearer your-api-keyAPI keys are generated per-project in the dashboard. Keys are hashed with Argon2id and cannot be retrieved after creation — store them securely.
POST /v1/chat/completions
Create a chat completion. OpenAI-compatible request and response format.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The model to use. Currently supported: gemini-2.5-flash, gemini-2.5-pro |
messages | array | Yes | Array of message objects with role ("user", "assistant", "system") and content (string) |
stream | boolean | No | Enable server-sent events streaming. Default: false |
temperature | number | No | Sampling temperature (0-2). Higher values increase randomness. Default: 1.0 |
max_tokens | integer | No | Maximum tokens in the response. Defaults to model maximum. |
Custom Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer token: Bearer your-api-key |
Content-Type | Yes | Must be application/json |
X-End-User-Id | No | Unique identifier for the end user. Enables per-user memory, rate limiting, token budgets, and analytics. |
Full Request Example
const response = await fetch(
"https://your-project.behest.app/v1/chat/completions",
{
method: "POST",
headers: {
"Authorization": "Bearer your-api-key",
"Content-Type": "application/json",
"X-End-User-Id": "user-12345",
},
body: JSON.stringify({
model: "gemini-2.5-flash",
messages: [
{ role: "user", content: "Summarize this contract" }
],
temperature: 0.7,
max_tokens: 1024,
}),
}
);Response Format
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709000000,
"model": "gemini-2.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here is a summary of the contract..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}Rate Limit Headers
Every response includes rate limit headers so your app can handle limits gracefully.
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests per minute for this tier |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp when the current window resets |
Error Codes
| Code | Meaning | Common Cause |
|---|---|---|
400 | Bad Request | Missing required fields, invalid model name, malformed JSON |
401 | Unauthorized | Missing or invalid API key, expired token |
403 | Forbidden | CORS origin not allowed, project kill switch active, prompt blocked by Sentinel |
429 | Too Many Requests | Rate limit exceeded (per-IP, per-project, or per-user). Check X-RateLimit-Reset header for retry timing. |
503 | Service Unavailable | Upstream LLM provider error or temporary service disruption. Retry with exponential backoff. |