> ## Documentation Index
> Fetch the complete documentation index at: https://docs.responsibleailabs.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Middleware

> Drop-in provider wrappers that intercept every LLM response and attach a RAIL score automatically.

Middleware is the pattern of intercepting every LLM response and attaching a RAIL score before it reaches the rest of your application. You replace your LLM client with a RAIL wrapper. The wrapper calls the LLM, evaluates the response, and returns both the content and the scores in a single object.

<Info>
  **Python SDK:** [Integrations reference](/integrations/overview) | **API:** [Evaluation endpoint](/api-reference/evaluation)
</Info>

## The problem it solves

Without middleware, adding responsible-AI checks to every LLM call means writing evaluation code in every place you call the LLM, duplicating logic, risking coverage gaps, and cluttering your application code:

<CodeGroup>
  ```python Without middleware theme={null}
  # Eval code scattered everywhere
  async def get_response(user_message):
      response = await openai_client.chat.completions.create(
          model="gpt-4o", messages=[{"role": "user", "content": user_message}]
      )
      content = response.choices[0].message.content

      # Must remember to eval in every function
      score = rail_client.eval(content=content, mode="basic")
      if score.rail_score.score < 7.0:
          raise ValueError("Response below quality threshold")

      return content
  ```

  ```python With middleware theme={null}
  # Scoring is automatic - set up once, forget about it
  from rail_score_sdk.integrations import RAILOpenAI

  client = RAILOpenAI(
      openai_api_key="...",
      rail_api_key="YOUR_RAIL_API_KEY",
      eval_mode="basic",
      threshold=7.0,
  )

  async def get_response(user_message):
      response = await client.chat(messages=[{"role": "user", "content": user_message}])
      return response.content  # .rail_score and .threshold_met are also available
  ```
</CodeGroup>

## How it works

```mermaid theme={null}
flowchart LR
    App["Your Application"] --> Wrapper["RAIL Wrapper"]
    Wrapper -->|"1. Forward request"| LLM["LLM API\n(OpenAI / Gemini / Anthropic)"]
    LLM -->|"2. Response content"| Wrapper
    Wrapper -->|"3. Evaluate"| RAIL["RAIL Score API"]
    RAIL -->|"Score + dimensions"| Wrapper
    Wrapper -->|"content + rail_score\n+ threshold_met"| App
```

When you call a method on the RAIL wrapper, three things happen transparently:

1. Your messages are forwarded to the underlying LLM API as a normal API call.
2. The LLM response is submitted to the RAIL evaluation endpoint in the mode you configured.
3. A wrapped response object is returned containing the original content, RAIL score, per-dimension scores, and a `threshold_met` boolean, all in one return value.

## Supported providers

| Wrapper         | Wraps                   | Python | JavaScript |
| --------------- | ----------------------- | ------ | ---------- |
| `RAILOpenAI`    | OpenAI chat completions | Yes    | Yes        |
| `RAILGemini`    | Google Gemini           | Yes    | Yes        |
| `RAILAnthropic` | Anthropic Claude        | Yes    | Yes        |
| `RAILLangChain` | Any LangChain LLM       | Yes    | —          |
| Custom wrapper  | Any HTTP-based LLM      | Yes    | Yes        |

## Observe mode vs enforce mode

<Tabs>
  <Tab title="Observe only">
    Score every response, never block. Use this to measure quality without interrupting the response flow.

    ```python theme={null}
    client = RAILOpenAI(
        openai_api_key="...",
        rail_api_key="...",
        eval_mode="basic",
        # No threshold — always returns response
    )

    response = await client.chat(messages=[...])
    print(response.content)        # The LLM's response
    print(response.rail_score)     # RAIL score (always present)
    print(response.threshold_met)  # None — no threshold configured
    ```
  </Tab>

  <Tab title="Enforce threshold">
    Raise `ThresholdError` when a response doesn't meet the bar.

    ```python theme={null}
    client = RAILOpenAI(
        openai_api_key="...",
        rail_api_key="...",
        eval_mode="basic",
        threshold=7.0,
    )

    try:
        response = await client.chat(messages=[...])
        return response.content
    except ThresholdError as e:
        # e.rail_score and e.failed_dimensions are available
        return fallback_response()
    ```
  </Tab>

  <Tab title="Auto-regenerate">
    Automatically trigger Safe Regeneration when a response falls below threshold.

    ```python theme={null}
    client = RAILOpenAI(
        openai_api_key="...",
        rail_api_key="...",
        eval_mode="basic",
        threshold=7.0,
        on_fail="regenerate",
        max_iterations=3,
    )

    # Returns the best content — original or regenerated
    response = await client.chat(messages=[...])
    print(response.content)
    print(response.iterations_taken)  # 1 if original passed
    ```
  </Tab>
</Tabs>

## Writing custom middleware

If you use an LLM provider without a built-in wrapper, build your own middleware using the core `eval()` call:

```python theme={null}
from rail_score_sdk import RailScoreClient

rail = RailScoreClient(api_key="...")

async def rail_middleware(llm_call, messages, threshold=7.0):
    """Generic RAIL middleware for any async LLM call."""
    content = await llm_call(messages)

    result = rail.eval(content=content, mode="basic")

    if result.rail_score.score < threshold:
        raise ValueError(
            f"Response scored {result.rail_score.score:.1f} — below threshold {threshold}. "
            f"Failed: {[d for d, s in result.dimension_scores.items() if s.score < threshold]}"
        )

    return content, result

# Use with any LLM:
content, score = await rail_middleware(my_llm_call, messages, threshold=7.5)
```

## What's next

<CardGroup cols={2}>
  <Card title="Concepts: Policy Engine" icon="gavel" href="/concepts/policy-engine">
    Declarative rules to act on scores across a session.
  </Card>

  <Card title="Python: Integrations" icon="python" href="/integrations/overview">
    Full provider wrapper documentation and options.
  </Card>

  <Card title="JavaScript: Providers" icon="js" href="/sdk/javascript/overview">
    TypeScript wrappers for OpenAI, Gemini, Anthropic.
  </Card>

  <Card title="Python: Middleware SDK" icon="layer-group" href="/sdk/python/middleware">
    RAILMiddleware - wrap any LLM function.
  </Card>
</CardGroup>
