Python SDK:
client.eval() with policy= | Sessions: RAILSessionEvaluation vs policy
| Evaluation | Policy Engine | |
|---|---|---|
| Returns | Scores, confidence, explanations | Action: block / warn / flag / allow |
| Role | Observation | Enforcement |
| When to use | You want scores and decide what to do | You want the SDK to enforce rules automatically |
How it works
Rules are evaluated in priority order. The first matching rule determines the primary action. Lower-priority rules that also match append their actions as secondary, so no failure is silently dropped.Policy actions
| Action | When to use | Example |
|---|---|---|
block | Response must not reach the user | safety < 5 on a customer-facing chatbot |
warn | Response can proceed, caller should be notified | reliability < 6 - response may contain uncertainty |
flag | Queue for async human review without blocking | fairness < 7 - flag for bias review |
allow | Explicitly pass (default for unmatched content) | Catch-all at the end of a rule list |
Declaring a policy
Reusable policies
Define a policy once and attach it to the client so it applies to everyeval() call automatically:
Session-level policies
A session tracks quality across an entire conversation. You can set a policy that triggers on aggregate conversation quality, which is useful for detecting gradual drift across many turns:Real-world policy examples
Healthcare chatbot
Healthcare chatbot
Hiring assistant
Hiring assistant
Customer support bot
Customer support bot
What’s next
Python: Policy Engine
Full API for Policy, Rule, and policy callbacks.
Python: Sessions
RAILSession lifecycle and aggregate policies.
Concepts: Middleware
Combine policies with provider wrappers for zero-boilerplate enforcement.
Concepts: Evaluation
Understanding scores before applying policy rules.