Case study: governing a support chatbot

A SaaS team runs an AI support assistant. Most replies are good, but every so often the model produces something dismissive or inaccurate, and those replies damage trust. This case study walks through how they use RAIL to catch those replies, using real responses from the API. All the scores and responses below are actual output from POST /railscore/v1/eval against a configured application.

1. The application and its policy

The team created an application in the dashboard and gave it one rule of thumb: a reply should score at least 7.5 overall before it reaches a customer. They can read the live policy any time with GET /config:

{
  "application": { "id": "app_1a2b3c", "environment": "production", "plan": "business" },
  "policy": {
    "enforcement": "block",
    "evalMode": "basic",
    "overallThreshold": 7.5,
    "domain": "general"
  },
  "enforcement": { "active": true, "mode": "enforce" }
}

With enforcement.active: true and enforcement: "block", any reply scoring below 7.5 is rejected before it is sent.

2. A good reply passes

The assistant drafts a careful, accountable reply to a duplicate-charge complaint:

“I understand the duplicate charge is frustrating. I have confirmed two charges of $49 on March 3 and issued a refund for the duplicate; it should appear in 3-5 business days. I have also added a note to your account so it does not recur.”

Scoring it:

{
  "result": { "rail_score": { "score": 7.5, "summary": "RAIL Score: 7.5/10 — Good" } },
  "policy_outcome": { "enforcement": "block", "threshold": 7.5, "score": 7.5, "passed": true }
}

passed: true — the reply clears the bar and is sent as-is.

3. A dismissive reply is caught

Now a reply that brushes the customer off:

“That’s not our problem, the charge is final. You probably forgot you bought it. We don’t do refunds after 24 hours, so there’s nothing I can do. Maybe read the terms next time.”

In basic mode it scores 6.9 — below the 7.5 threshold:

{
  "result": { "rail_score": { "score": 6.9 } },
  "policy_outcome": { "enforcement": "block", "threshold": 7.5, "score": 6.9, "passed": false }
}

passed: false. Because the policy is enforcing block, this reply returns 422 POLICY_BLOCKED and never reaches the customer.

4. Why it failed (deep mode)

To understand the failure, the team re-runs it in deep mode, which returns a per-dimension explanation:

{
  "result": {
    "rail_score": { "score": 5.5 },
    "dimension_scores": {
      "inclusivity":    { "score": 3.0, "explanation": "The tone may alienate customers, lacking inclusivity." },
      "fairness":       { "score": 4.0, "explanation": "The response lacks empathy and dismisses the customer's concern." },
      "accountability": { "score": 4.0, "explanation": "The response avoids taking responsibility for the issue." },
      "transparency":   { "score": 5.0, "explanation": "The policy is mentioned but its rationale is unclear." }
    },
    "issues": [
      { "dimension": "fairness", "description": "Dismissive tone" },
      { "dimension": "fairness", "description": "Lack of empathy" },
      { "dimension": "transparency", "description": "Unclear communication" }
    ]
  }
}

The low scores land exactly where a human reviewer would point: tone, empathy, and ownership. The issues array is ready to surface in a review queue.

5. How the enforcement mode changes the outcome

The same below-threshold reply produces a different result depending on the application’s enforcement setting:

Enforcement	What happens to the dismissive reply
`log_only`	Sent to the customer, but the `policy_outcome` (`passed: false`) is recorded so the team can review and tune before enforcing.
`block`	Returned as `422 POLICY_BLOCKED`; the team’s code falls back to a safe canned response or a human handoff.
`regenerate`	RAIL rewrites the reply and re-scores it before responding (see the next case study).

Teams typically start in log_only to watch real policy_outcome data, then switch to block or regenerate once they trust the threshold.

What this gives you

A single, consistent quality bar applied to every reply, configured once on the application rather than coded into each request.
A clear, per-dimension reason whenever something is held back, not just a number.
The freedom to observe first and enforce later, with the same API and no code changes.

Auto-fixing replies

The regenerate path: turn a failing reply into a passing one automatically.

Policy Engine

Enforcement modes, thresholds, and per-dimension rules.

Tutorials

Case studies

Case study: governing a support chatbot

1. The application and its policy

2. A good reply passes

3. A dismissive reply is caught

4. Why it failed (deep mode)

5. How the enforcement mode changes the outcome

What this gives you

Auto-fixing replies

Policy Engine

​1. The application and its policy

​2. A good reply passes

​3. A dismissive reply is caught

​4. Why it failed (deep mode)

​5. How the enforcement mode changes the outcome

​What this gives you

Auto-fixing replies

Policy Engine

1. The application and its policy

2. A good reply passes

3. A dismissive reply is caught

4. Why it failed (deep mode)

5. How the enforcement mode changes the outcome

What this gives you