Guardrails — PII Shield and Sentinel
⚠️ Heads up — the curl examples below pass
BEHEST_KEYdirectly as the Bearer on/v1/projects/*admin endpoints. That pattern requires a JWT now. Follow the mint-first flow in the updated migration guide to get a JWT, then substitute it for$BEHEST_KEYin the curl snippets on this page. PII Shield / Sentinel configuration model and headers are unchanged.
Behest provides two guardrails that run at the API gateway level before any request reaches an LLM provider: PII Shield for personal data protection, and Sentinel for prompt injection and content policy enforcement. Both are configured per project and operate independently.
Architecture
Guardrails run in the LiteLLM proxy layer, inside two callback hooks:
Your app
|
v
Kong Gateway
| (validates JWT, resolves project, reads config from Redis)
v
LiteLLM Proxy
|-- PIIShieldCallback.async_pre_call_hook() <- scans and transforms request
|-- SentinelCallback.async_pre_call_hook() <- scans and optionally blocks
|
v
LLM Provider (OpenAI, Anthropic, Google, etc.)
|
v
LiteLLM Proxy
|-- PIIShieldCallback.async_log_success_event() <- restores masked tokens
v
Your app (response with original values restored for MASK actions)
Configuration is read from Redis on every request (cached with a 30-second TTL). Settings are written to Redis when you click Deploy in the dashboard, or when you call the deploy endpoint via the API.
PII Shield
PII Shield uses Microsoft Presidio (NER + regex hybrid) to detect personal information in user messages before they reach any LLM provider.
Modes
| Mode | Behavior |
|---|---|
disabled | No scanning. Default for new projects. |
shadow | Scans every request and logs detections to the guardrails:events stream. Does not modify requests or block traffic. Use this to measure detection rates before enforcing. |
enforce | Scans every request and applies the configured action (MASK, REDACT, or BLOCK) for each detected entity type. |
Entity types
| Entity | Category | Example |
|---|---|---|
EMAIL_ADDRESS | Contact | john@example.com |
PHONE_NUMBER | Contact | (555) 123-4567 |
PERSON | Identity | Person names (via NER) |
US_SSN | Identity | 123-45-6789 |
LOCATION | Identity | Addresses and location names (via NER) |
DATE_TIME | Identity | Date and time expressions |
CREDIT_CARD | Financial | 4111-1111-1111-1111 |
US_BANK_NUMBER | Financial | US bank account numbers |
IBAN_CODE | Financial | International bank account numbers |
IP_ADDRESS | Technical | 192.168.1.1 |
Presidio uses a confidence threshold of 0.7. Only entities present in your pii_entities
map are scanned — entities not listed are never flagged.
Actions per entity
| Action | Behavior |
|---|---|
MASK | Reversible tokenization. The entity is replaced with a deterministic token (e.g., <EMAIL_ADDRESS_a3f8c2d1b4e9>). The LLM never sees the original value. After the LLM responds, Behest restores the original value in the response text. The vault TTL is 5 minutes. |
REDACT | Permanent replacement. The entity is replaced with <EMAIL_ADDRESS>. The original value is discarded and cannot be recovered. |
BLOCK | The entire request is rejected with a 403 error if this entity type is detected. In shadow mode, detections are logged but requests are not blocked. |
How MASK works
For reversible masking, Behest creates a short-lived vault entry in Redis keyed by
pii_vault:{tenantId}:{projectId}:{requestId} with a 5-minute TTL. After the LLM responds,
the post-call hook restores all masked tokens from the vault, so your app receives the
response with the original values intact. The LLM processes anonymized content throughout.
Sentinel
Sentinel guards against prompt injection attempts and custom content policy violations.
Modes
| Mode | Behavior |
|---|---|
disabled | No scanning. Default for new projects. |
shadow | Scans requests and logs detections. Requests pass through even when a pattern matches. Use this to tune your blocklist before enforcing. |
enforce | Blocks any request that matches a jailbreak pattern or contains a blocklisted term. Returns 403. |
What Sentinel detects
Jailbreak patterns (built-in, cannot be disabled individually):
- "Ignore all previous instructions" and variations
- DAN / "act as an unrestricted AI" patterns
- Fake
[SYSTEM]or<<SYS>>injection in user input - "Bypass your safety/content filters" variants
- "Disregard/override your system prompt/guidelines" variants
Custom blocklist: An array of strings you define. Matching is case-insensitive. Maximum 200 terms.
Sentinel scans only user messages (role: "user"). System prompts are never scanned — they
are set by your application and are trusted.
Configuring Guardrails in the Dashboard
- Open your project from the Projects page.
- Navigate to Configuration (the canvas/config tab for your project).
- Find the Guardrails section.
- Toggle the PII Shield and Sentinel modes.
- For PII Shield: select which entities to scan and choose an action (MASK, REDACT, BLOCK) for each.
- For Sentinel: add custom blocklist terms.
- Click Save to store the draft.
- Click Deploy to push the configuration to the gateway. Changes take effect within 30 seconds.
Important: Saving updates the database. Deploying pushes the settings to Redis and activates them at the gateway. Both steps are required for changes to take effect.
Configuring Guardrails via API
All guardrail settings are part of the project settings resource.
Read current settings
curl -X GET https://api.behest.app/v1/projects/$PROJECT_ID/settings \
-H "Authorization: Bearer $TOKEN" \
| jq '{pii_mode, pii_entities, sentinel_mode, sentinel_blocklist}'Enable PII Shield in shadow mode
curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"pii_mode": "shadow",
"pii_entities": {
"EMAIL_ADDRESS": "MASK",
"PHONE_NUMBER": "MASK",
"CREDIT_CARD": "BLOCK",
"US_SSN": "REDACT"
}
}'Enable PII Shield in enforce mode
curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"pii_mode": "enforce",
"pii_entities": {
"EMAIL_ADDRESS": "MASK",
"PHONE_NUMBER": "MASK",
"PERSON": "MASK",
"CREDIT_CARD": "BLOCK",
"US_SSN": "REDACT",
"IBAN_CODE": "BLOCK",
"IP_ADDRESS": "MASK"
}
}'Configure Sentinel with a custom blocklist
curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"sentinel_mode": "shadow",
"sentinel_blocklist": [
"ignore previous instructions",
"competitor-product-name",
"internal-project-codename"
]
}'Deploy settings to activate
curl -X POST https://api.behest.app/v1/projects/$PROJECT_ID/settings/deploy \
-H "Authorization: Bearer $TOKEN"Enable both guardrails in one call
curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"pii_mode": "enforce",
"pii_entities": {
"EMAIL_ADDRESS": "MASK",
"CREDIT_CARD": "BLOCK"
},
"sentinel_mode": "enforce",
"sentinel_blocklist": ["secret-project"]
}' && \
curl -X POST https://api.behest.app/v1/projects/$PROJECT_ID/settings/deploy \
-H "Authorization: Bearer $TOKEN"Viewing Guardrail Events
Guardrail detections and blocks are logged to a Redis stream (guardrails:events) and
accessible via the dashboard and API.
Via the dashboard
Navigate to your project's Logs or Analytics page. The guardrails event feed shows recent detections with their type, mode, entity types, and action taken.
Via the API
# All events for a project
curl "https://api.behest.app/v1/tenants/$TENANT_ID/guardrails/events?project_id=$PROJECT_ID" \
-H "Authorization: Bearer $TOKEN"
# Filter by event type
curl "https://api.behest.app/v1/tenants/$TENANT_ID/guardrails/events?type=pii_detection&project_id=$PROJECT_ID" \
-H "Authorization: Bearer $TOKEN"
# Filter by type: pii_block, pii_detection, sentinel_jailbreak, sentinel_blocklist
curl "https://api.behest.app/v1/tenants/$TENANT_ID/guardrails/events?type=sentinel_jailbreak" \
-H "Authorization: Bearer $TOKEN"Event structure
Each event contains:
{
"type": "pii_detection",
"tenant_id": "...",
"project_id": "...",
"direction": "input",
"mode": "enforce",
"action_taken": "masked",
"entity_types": ["EMAIL_ADDRESS", "PHONE_NUMBER"],
"entity_count": "3",
"request_id": "req_abc123",
"timestamp": "1743024000.123"
}Using Shadow Mode for Testing Before Enforcing
Shadow mode is the recommended workflow when enabling guardrails on an existing project:
- Set
pii_modeto"shadow"and deploy. - Run your normal workload for 24-48 hours.
- Review the guardrail events to see which entity types are being detected and how frequently.
- Tune your
pii_entitiesmap: adjust which entities to monitor and their actions. - Once comfortable, switch
pii_modeto"enforce"and deploy.
The same workflow applies to Sentinel. Use shadow mode to discover which blocklist terms appear naturally in legitimate traffic before blocking on them.
Error Responses When Blocked
When PII Shield blocks a request (entity action BLOCK in enforce mode):
{
"error": {
"message": "Request blocked: detected PII types ['CREDIT_CARD']",
"type": "guardrail_error",
"code": "BEHEST_PII_BLOCKED"
}
}HTTP status: 403
When Sentinel blocks a request:
{
"error": {
"message": "Request blocked: potential prompt injection detected",
"type": "guardrail_error",
"code": "BEHEST_CONTENT_BLOCKED"
}
}HTTP status: 403
Best Practices
Start with shadow mode. Never enable enforce mode on day one. Run shadow mode for at
least 24 hours to understand your traffic before activating blocking behavior.
MASK instead of BLOCK for most entities. BLOCK is appropriate for data your application should never send to any LLM (SSNs, credit card numbers). MASK is better for names and emails where you want the LLM to reference the person naturally in its response.
Use REDACT for data you never want in logs. REDACT permanently discards the original value — it cannot be restored. Use this when compliance requires that raw PII never reach the LLM layer even transiently.
Keep the Sentinel blocklist focused. Avoid blocking common words that may appear in legitimate requests. Shadow mode helps you validate that your blocklist terms are specific enough not to generate false positives.
Deploy after every settings change. The PUT settings endpoint updates the database. Only the deploy endpoint pushes changes to Redis and activates them at the gateway.
Default state for new projects: Both guardrails start as disabled with empty entity
and blocklist configuration. Guardrails must be explicitly configured and deployed to take
effect.