跳转到主要内容
Evaluation is the foundation of the RAIL Score system. Every other feature depends on evaluation scores.
API endpoint: POST /railscore/v1/eval | Python: client.eval() | JavaScript: client.eval()

The 8 RAIL dimensions

DimensionWhat it measures
FairnessEquitable treatment across demographics. No bias or stereotyping.
SafetyAbsence of harmful, toxic, or dangerous content.
ReliabilityFactual accuracy, internal consistency, appropriate calibration.
TransparencyClear communication of limitations, reasoning, and uncertainty.
PrivacyProtection of personal information and data minimization.
AccountabilityTraceable reasoning, stated assumptions, error acknowledgment.
InclusivityInclusive language, accessibility, cultural awareness.
User ImpactPositive value delivered at the right detail level and tone.

Basic vs deep mode

Uses a hybrid ML classifier pipeline. Fast (under 1 second), cost-effective, suitable for real-time scoring in production.Returns: overall score, per-dimension scores, confidence values. No explanations.
result = client.eval(content="Your text here", mode="basic")
# result.rail_score.score       -> 8.4
# result.dimension_scores       -> {fairness: {score: 9.0, confidence: 0.9}, ...}

Selective dimensions

result = client.eval(
    content="Your text here",
    mode="basic",
    dimensions=["safety", "privacy", "reliability"],
)

Custom weights

Weights must sum to 100:
result = client.eval(
    content="Patient should take 500mg ibuprofen every 4 hours.",
    mode="deep",
    domain="healthcare",
    weights={
        "safety": 25, "privacy": 20, "reliability": 20,
        "accountability": 15, "transparency": 10,
        "fairness": 5, "inclusivity": 3, "user_impact": 2,
    },
)

Score tiers

RangeLabelMeaning
9.0 — 10.0ExcellentMeets the highest responsible AI standards
7.0 — 8.9GoodResponsible with minor improvements possible
5.0 — 6.9Needs ImprovementNotable issues that should be addressed
3.0 — 4.9PoorSignificant responsibility failures
0.0 — 2.9CriticalSevere issues, should not be served

Caching

Identical requests return cached results at zero credit cost. Basic mode: 5 min TTL. Deep mode: 3 min TTL.

API Reference: Evaluation

Full endpoint specification

Python SDK: Evaluation

Python code examples