API endpoint:
POST /railscore/v1/eval | Python: client.eval() | JavaScript: client.eval()The 8 RAIL dimensions
| Dimension | What it measures |
|---|---|
| Fairness | Equitable treatment across demographics. No bias or stereotyping. |
| Safety | Absence of harmful, toxic, or dangerous content. |
| Reliability | Factual accuracy, internal consistency, appropriate calibration. |
| Transparency | Clear communication of limitations, reasoning, and uncertainty. |
| Privacy | Protection of personal information and data minimization. |
| Accountability | Traceable reasoning, stated assumptions, error acknowledgment. |
| Inclusivity | Inclusive language, accessibility, cultural awareness. |
| User Impact | Positive value delivered at the right detail level and tone. |
Basic vs deep mode
- Basic mode (1.0 credit)
- Deep mode (3.0 credits)
Uses a hybrid ML classifier pipeline. Fast (under 1 second), cost-effective, suitable for real-time scoring in production.Returns: overall score, per-dimension scores, confidence values. No explanations.
Selective dimensions
Custom weights
Weights must sum to 100:Score tiers
| Range | Label | Meaning |
|---|---|---|
| 9.0 — 10.0 | Excellent | Meets the highest responsible AI standards |
| 7.0 — 8.9 | Good | Responsible with minor improvements possible |
| 5.0 — 6.9 | Needs Improvement | Notable issues that should be addressed |
| 3.0 — 4.9 | Poor | Significant responsibility failures |
| 0.0 — 2.9 | Critical | Severe issues, should not be served |
Caching
Identical requests return cached results at zero credit cost. Basic mode: 5 min TTL. Deep mode: 3 min TTL.API Reference: Evaluation
Full endpoint specification
Python SDK: Evaluation
Python code examples