Skip to main content
A safety product that is itself unsafe is a credibility failure. These are launch blockers, enforced in code and as a hard CI gate.

Verdicts are data, not prose

Block decisions return structured verdict objects, never advisory text an agent can ignore. RAIL’s own Guard Benchmark documented the Guardrail Adherence Paradox — agents that refuse in text but execute the unsafe call anyway. The MCP server surfaces machine-actionable verdicts so the agent’s control flow can honor them.

No reflection of analyzed content

Tools return verdicts, scores, spans, and masked excerpts — never the raw analyzed text. Echoing untrusted content back into the agent’s context is a second-order prompt-injection vector. rail_detect_injection and rail_scan_tool_result strip any upstream payload previews.

No raw PII

rail_dpdp_scan and rail_scan_tool_result return PII types, character offsets, and masked values — never the detected Aadhaar, PAN, or other raw value. rail_scan_tool_result returns a locally redacted copy of the text so the agent never needs the original.

Tenant isolation by construction

DPDP sessions, timers, and audit events are scoped to the authenticated key. The tenant identity comes from the auth middleware, never from a tool parameter, so cross-tenant access is impossible by construction.

Token handling

In phase 2 (OAuth 2.1), client tokens are validated and dropped; downstream calls use the gateway’s own service credential and audience-bound tokens (RFC 8707). In phase 1 the bearer rail_ key is your RAIL credential, so it is forwarded upstream — this is what ties usage to your org’s credits and isolation.

Operational guards

  • Input caps and timeouts — content is capped at the gateway, with bounded upstream timeouts and structured timeout errors.
  • Rate limiting — per key, identical to the REST API.
  • Audit logging — every call is logged with tenant, tool, verdict, and credits. Content bodies are never logged by default.