跳转到主要内容
Middleware is the pattern of intercepting every LLM response and attaching a RAIL score before it reaches the rest of your application. You replace your LLM client with a RAIL wrapper. The wrapper calls the LLM, evaluates the response, and returns both the content and the scores in a single object.

The problem it solves

Without middleware, adding responsible-AI checks to every LLM call means writing evaluation code in every place you call the LLM, duplicating logic, risking coverage gaps, and cluttering your application code:
# Eval code scattered everywhere
async def get_response(user_message):
    response = await openai_client.chat.completions.create(
        model="gpt-4o", messages=[{"role": "user", "content": user_message}]
    )
    content = response.choices[0].message.content

    # Must remember to eval in every function
    score = rail_client.eval(content=content, mode="basic")
    if score.rail_score.score < 7.0:
        raise ValueError("Response below quality threshold")

    return content

How it works

When you call a method on the RAIL wrapper, three things happen transparently:
  1. Your messages are forwarded to the underlying LLM API as a normal API call.
  2. The LLM response is submitted to the RAIL evaluation endpoint in the mode you configured.
  3. A wrapped response object is returned containing the original content, RAIL score, per-dimension scores, and a threshold_met boolean, all in one return value.

Supported providers

WrapperWrapsPythonJavaScript
RAILOpenAIOpenAI chat completionsYesYes
RAILGeminiGoogle GeminiYesYes
RAILAnthropicAnthropic ClaudeYesYes
RAILLangChainAny LangChain LLMYes
Custom wrapperAny HTTP-based LLMYesYes

Observe mode vs enforce mode

Score every response, never block. Use this to measure quality without interrupting the response flow.
client = RAILOpenAI(
    openai_api_key="...",
    rail_api_key="...",
    eval_mode="basic",
    # No threshold — always returns response
)

response = await client.chat(messages=[...])
print(response.content)        # The LLM's response
print(response.rail_score)     # RAIL score (always present)
print(response.threshold_met)  # None — no threshold configured

Writing custom middleware

If you use an LLM provider without a built-in wrapper, build your own middleware using the core eval() call:
from rail_score_sdk import RailScoreClient

rail = RailScoreClient(api_key="...")

async def rail_middleware(llm_call, messages, threshold=7.0):
    """Generic RAIL middleware for any async LLM call."""
    content = await llm_call(messages)

    result = rail.eval(content=content, mode="basic")

    if result.rail_score.score < threshold:
        raise ValueError(
            f"Response scored {result.rail_score.score:.1f} — below threshold {threshold}. "
            f"Failed: {[d for d, s in result.dimension_scores.items() if s.score < threshold]}"
        )

    return content, result

# Use with any LLM:
content, score = await rail_middleware(my_llm_call, messages, threshold=7.5)

What’s next

Concepts: Policy Engine

Declarative rules to act on scores across a session.

Python: Integrations

Full provider wrapper documentation and options.

JavaScript: Providers

TypeScript wrappers for OpenAI, Gemini, Anthropic.

Python: Middleware SDK

RAILMiddleware - wrap any LLM function.