POST /railscore/v1/eval against a configured application.
1. The application and its policy
The team created an application in the dashboard and gave it one rule of thumb: a reply should score at least 7.5 overall before it reaches a customer. They can read the live policy any time withGET /config:
enforcement.active: true and enforcement: "block", any reply scoring below 7.5 is rejected before it is sent.
2. A good reply passes
The assistant drafts a careful, accountable reply to a duplicate-charge complaint:“I understand the duplicate charge is frustrating. I have confirmed two charges of $49 on March 3 and issued a refund for the duplicate; it should appear in 3-5 business days. I have also added a note to your account so it does not recur.”Scoring it:
passed: true — the reply clears the bar and is sent as-is.
3. A dismissive reply is caught
Now a reply that brushes the customer off:“That’s not our problem, the charge is final. You probably forgot you bought it. We don’t do refunds after 24 hours, so there’s nothing I can do. Maybe read the terms next time.”In basic mode it scores 6.9 — below the 7.5 threshold:
passed: false. Because the policy is enforcing block, this reply returns 422 POLICY_BLOCKED and never reaches the customer.
4. Why it failed (deep mode)
To understand the failure, the team re-runs it in deep mode, which returns a per-dimension explanation:issues array is ready to surface in a review queue.
5. How the enforcement mode changes the outcome
The same below-threshold reply produces a different result depending on the application’senforcement setting:
| Enforcement | What happens to the dismissive reply |
|---|---|
log_only | Sent to the customer, but the policy_outcome (passed: false) is recorded so the team can review and tune before enforcing. |
block | Returned as 422 POLICY_BLOCKED; the team’s code falls back to a safe canned response or a human handoff. |
regenerate | RAIL rewrites the reply and re-scores it before responding (see the next case study). |
log_only to watch real policy_outcome data, then switch to block or regenerate once they trust the threshold.
What this gives you
- A single, consistent quality bar applied to every reply, configured once on the application rather than coded into each request.
- A clear, per-dimension reason whenever something is held back, not just a number.
- The freedom to observe first and enforce later, with the same API and no code changes.
Auto-fixing replies
The regenerate path: turn a failing reply into a passing one automatically.
Policy Engine
Enforcement modes, thresholds, and per-dimension rules.