Use this file to discover all available pages before exploring further.
Middleware is the pattern of intercepting every LLM response and attaching a RAIL score before it reaches the rest of your application. You replace your LLM client with a RAIL wrapper. The wrapper calls the LLM, evaluates the response, and returns both the content and the scores in a single object.
Without middleware, adding responsible-AI checks to every LLM call means writing evaluation code in every place you call the LLM, duplicating logic, risking coverage gaps, and cluttering your application code:
# Eval code scattered everywhereasync def get_response(user_message): response = await openai_client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": user_message}] ) content = response.choices[0].message.content # Must remember to eval in every function score = rail_client.eval(content=content, mode="basic") if score.rail_score.score < 7.0: raise ValueError("Response below quality threshold") return content
When you call a method on the RAIL wrapper, three things happen transparently:
Your messages are forwarded to the underlying LLM API as a normal API call.
The LLM response is submitted to the RAIL evaluation endpoint in the mode you configured.
A wrapped response object is returned containing the original content, RAIL score, per-dimension scores, and a threshold_met boolean, all in one return value.
Raise ThresholdError when a response doesn’t meet the bar.
client = RAILOpenAI( openai_api_key="...", rail_api_key="...", eval_mode="basic", threshold=7.0,)try: response = await client.chat(messages=[...]) return response.contentexcept ThresholdError as e: # e.rail_score and e.failed_dimensions are available return fallback_response()
Automatically trigger Safe Regeneration when a response falls below threshold.
client = RAILOpenAI( openai_api_key="...", rail_api_key="...", eval_mode="basic", threshold=7.0, on_fail="regenerate", max_iterations=3,)# Returns the best content — original or regeneratedresponse = await client.chat(messages=[...])print(response.content)print(response.iterations_taken) # 1 if original passed