RedactGate is a redaction firewall that sits between your staff and every cloud LLM. It reversibly tokenizes sensitive data before the call leaves your perimeter, then re-inflates the model's answer on the way back. Raw client data never crosses the wire — and the answer is just as good.
A tax associate drops a bank statement into a chat box to get a summary. A paralegal pastes a contract with names and account numbers. They have no safe alternative — so the data is gone, and your compliance officer can't prove it wasn't. RedactGate is the safe alternative: the model still does the work, but the raw record stays inside your perimeter, and every request is logged for the audit.
A reproducible recall-vs-fidelity benchmark ships in the repo. Detection on the jurisdiction golden sets, regex-only (higher with Presidio enabled):
AES-256-GCM, per-session keys, referential consistency — coreference survives, so the model still reasons correctly.
One AI_PROVIDER switch fans out to Anthropic, OpenAI, Gemini, Azure, Bedrock, DigitalOcean, or local Ollama. No lock-in.
SIN, BN, GST, SSN, EIN, IBAN, VAT, EDRPOU, UCI and more — en/uk/ru/fr aware, with real validators (Luhn, IBAN mod-97).
Append-only, tamper-evident, zero-raw-value. The export your compliance officer hands the auditor.
Re-inflation works mid-SSE-stream — a placeholder split across chunks is never emitted half-substituted.
Regex + Presidio + Ollama = fully offline. No cloud, no key, no data leaves the building.
Point any OpenAI client at it. The provider behind it is your config, not the caller's.
# clone, set keys, and bring up postgres + redis + api + web git clone https://github.com/ctmakc/redactgate && cd redactgate cp .env.example .env # add your provider key + generate the 3 vault keys docker compose --profile web up -d # your existing OpenAI client just changes its base_url: curl http://localhost:8088/v1/chat/completions \ -H "Authorization: Bearer $RG_KEY" -H "Content-Type: application/json" \ -d '{"model":"gpt-4o-mini","messages":[{"role":"user", "content":"Summarize: SIN 193 456 787, jane@acme.co"}]}' # upstream sees [[SIN_xxxx]] / [[EMAIL_xxxx]] — you get the real answer back.