Guide

PII Protection for LLM Applications

Behest PII Shield uses Microsoft Presidio to automatically detect and redact personally identifiable information before it reaches the LLM. No PII ever leaves your infrastructure.

What PII Shield Does

When a user sends a message to your AI-powered application, that message may contain personally identifiable information — names, email addresses, phone numbers, social security numbers, credit card numbers, and more. If this data reaches the LLM provider, it becomes part of the provider's processing pipeline and potentially their training data.

Behest PII Shield intercepts every request before it reaches the LLM. It uses Microsoft Presidio, an open-source PII detection engine, to scan the message content, identify PII entities, and redact them. The LLM only sees sanitized text — the original PII never leaves your Behest deployment.

This happens automatically on every request. You do not need to write any PII detection code, maintain regex patterns, or integrate third-party PII services. Behest handles it as part of the request pipeline.

PII Types Detected

PII Shield detects and redacts the following types of personally identifiable information:

Person names

Email addresses

Phone numbers

Social Security Numbers (SSN)

Credit card numbers

Bank account numbers (IBAN)

IP addresses

Physical addresses

Date of birth

Driver license numbers

Passport numbers

Medical record numbers

Presidio uses a combination of named entity recognition (NER), pattern matching, and context analysis to identify PII with high accuracy while minimizing false positives.

How It Works: Before and After

Here is an example of a request containing PII and what the LLM actually receives after PII Shield processes it:

// Your app sends this request to Behest
const response = await fetch(BEHEST_URL, {
  method: "POST",
  headers: {
    "Authorization": "Bearer your-api-key",
    "Content-Type": "application/json",
    "X-End-User-Id": "user-123",
    // Uniquely identifies a conversation thread for per-session cost attribution.
    "X-Session-Id": "user-123-conv-tax-help",
  },
  body: JSON.stringify({
    model: "gemini-2.5-flash",
    messages: [
      {
        role: "user",
        content:
          "My name is John Smith, my email is john@example.com, " +
          "and my SSN is 123-45-6789. Can you help me file taxes?",
      },
    ],
  }),
});

What the LLM actually receives (after PII Shield):

"My name is <PERSON>, my email is <EMAIL_ADDRESS>, and my SSN is <US_SSN>. Can you help me file taxes?"

The LLM can still understand the intent and provide a helpful response about filing taxes, but it never sees the actual name, email, or SSN. The PII tokens (<PERSON>, <EMAIL_ADDRESS>, <US_SSN>) preserve the semantic structure of the message while removing sensitive data.

Enabling PII Shield

PII Shield is enabled at the project level in the Behest dashboard. Once enabled, it automatically processes every request for that project — no code changes needed in your application:

Open your project in the Behest Dashboard
Navigate to the Security settings
Toggle PII Shield to “Enabled”
Optionally configure which PII types to detect (by default, all types are enabled)

That is it. Every request to your project endpoint now gets PII scrubbing automatically. Your application code does not change at all — the scrubbing happens transparently in the Behest pipeline before the request reaches the LLM.

Compliance Implications

PII Shield helps your AI application meet regulatory requirements for data protection. Here is how it applies to common compliance frameworks:

GDPR (General Data Protection Regulation)

GDPR requires that personal data be processed with appropriate safeguards. PII Shield implements the principle of data minimization — only non-personal data reaches the LLM provider. Since Behest is self-hosted in your cloud, personal data never leaves your infrastructure. This helps satisfy Articles 5(1)(c) (data minimization) and 25 (data protection by design).

HIPAA (Health Insurance Portability and Accountability Act)

For healthcare applications, PII Shield detects and redacts Protected Health Information (PHI) including names, dates of birth, medical record numbers, and other HIPAA identifiers. Combined with Behest's self-hosted deployment model, this helps maintain HIPAA compliance by ensuring PHI is never transmitted to external LLM providers.

SOC 2

SOC 2 requires controls around data confidentiality and processing integrity. PII Shield provides an automated control that prevents sensitive data from being processed by third-party LLM providers, supporting your SOC 2 compliance posture.

Best Practices

Enable PII Shield early — turn it on during development, not just in production. This catches PII leakage in your test data before it becomes a production issue.
Review detected PII types — check the Behest analytics dashboard to see which PII types are being detected most frequently. This helps you understand what data your users are sending.
Combine with Sentinel — use PII Shield alongside Behest Sentinel (prompt injection defense) for comprehensive request protection.
Self-hosted deployment — for maximum data security, deploy Behest in your own cloud. PII is detected and redacted within your infrastructure — the original data never leaves your network.

Quickstart Trust Center Multi-Tenant Auth