Middleware - RAIL Score

Middleware is the pattern of intercepting every LLM response and attaching a RAIL score before it reaches the rest of your application. You replace your LLM client with a RAIL wrapper. The wrapper calls the LLM, evaluates the response, and returns both the content and the scores in a single object.

Python SDK: Integrations reference | API: Evaluation endpoint

The problem it solves

Without middleware, adding responsible-AI checks to every LLM call means writing evaluation code in every place you call the LLM, duplicating logic, risking coverage gaps, and cluttering your application code:

# Eval code scattered everywhere
async def get_response(user_message):
    response = await openai_client.chat.completions.create(
        model="gpt-4o", messages=[{"role": "user", "content": user_message}]
    )
    content = response.choices[0].message.content

    # Must remember to eval in every function
    score = rail_client.eval(content=content, mode="basic")
    if score.rail_score.score < 7.0:
        raise ValueError("Response below quality threshold")

    return content

How it works

When you call a method on the RAIL wrapper, three things happen transparently:

Your messages are forwarded to the underlying LLM API as a normal API call.
The LLM response is submitted to the RAIL evaluation endpoint in the mode you configured.
A wrapped response object is returned containing the original content, RAIL score, per-dimension scores, and a threshold_met boolean, all in one return value.

Supported providers

Wrapper	Wraps	Python	JavaScript
`RAILOpenAI`	OpenAI chat completions	Yes	Yes
`RAILGemini`	Google Gemini	Yes	Yes
`RAILAnthropic`	Anthropic Claude	Yes	Yes
`RAILLangChain`	Any LangChain LLM	Yes	—
Custom wrapper	Any HTTP-based LLM	Yes	Yes

Observe mode vs enforce mode

Observe only
Enforce threshold
Auto-regenerate

Score every response, never block. Use this to measure quality without interrupting the response flow.

client = RAILOpenAI(
    openai_api_key="...",
    rail_api_key="...",
    eval_mode="basic",
    # No threshold — always returns response
)

response = await client.chat(messages=[...])
print(response.content)        # The LLM's response
print(response.rail_score)     # RAIL score (always present)
print(response.threshold_met)  # None — no threshold configured

Raise ThresholdError when a response doesn’t meet the bar.

client = RAILOpenAI(
    openai_api_key="...",
    rail_api_key="...",
    eval_mode="basic",
    threshold=7.0,
)

try:
    response = await client.chat(messages=[...])
    return response.content
except ThresholdError as e:
    # e.rail_score and e.failed_dimensions are available
    return fallback_response()

Automatically trigger Safe Regeneration when a response falls below threshold.

client = RAILOpenAI(
    openai_api_key="...",
    rail_api_key="...",
    eval_mode="basic",
    threshold=7.0,
    on_fail="regenerate",
    max_iterations=3,
)

# Returns the best content — original or regenerated
response = await client.chat(messages=[...])
print(response.content)
print(response.iterations_taken)  # 1 if original passed

Writing custom middleware

If you use an LLM provider without a built-in wrapper, build your own middleware using the core eval() call:

from rail_score_sdk import RailScoreClient

rail = RailScoreClient(api_key="...")

async def rail_middleware(llm_call, messages, threshold=7.0):
    """Generic RAIL middleware for any async LLM call."""
    content = await llm_call(messages)

    result = rail.eval(content=content, mode="basic")

    if result.rail_score.score < threshold:
        raise ValueError(
            f"Response scored {result.rail_score.score:.1f} — below threshold {threshold}. "
            f"Failed: {[d for d, s in result.dimension_scores.items() if s.score < threshold]}"
        )

    return content, result

# Use with any LLM:
content, score = await rail_middleware(my_llm_call, messages, threshold=7.5)

What’s next

Concepts: Policy Engine

Declarative rules to act on scores across a session.

Python: Integrations

Full provider wrapper documentation and options.

JavaScript: Providers

TypeScript wrappers for OpenAI, Gemini, Anthropic.

Python: Middleware SDK

RAILMiddleware - wrap any LLM function.

Getting Started

Concepts

Documentation Index

​The problem it solves

​How it works

​Supported providers

​Observe mode vs enforce mode

​Writing custom middleware

​What’s next

Concepts: Policy Engine

Python: Integrations

JavaScript: Providers

Python: Middleware SDK

The problem it solves

How it works

Supported providers

Observe mode vs enforce mode

Writing custom middleware

What’s next