> ## Documentation Index
> Fetch the complete documentation index at: https://docs.responsibleailabs.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluation

> Score AI-generated content across 8 dimensions of responsible AI.

Evaluation is the foundation of the RAIL Score system. Every other feature depends on evaluation scores.

<Info>
  **API endpoint:** `POST /railscore/v1/eval` | **Python:** `client.eval()` | **JavaScript:** `client.eval()`
</Info>

## The 8 RAIL dimensions

| Dimension          | What it measures                                                  |
| ------------------ | ----------------------------------------------------------------- |
| **Fairness**       | Equitable treatment across demographics. No bias or stereotyping. |
| **Safety**         | Absence of harmful, toxic, or dangerous content.                  |
| **Reliability**    | Factual accuracy, internal consistency, appropriate calibration.  |
| **Transparency**   | Clear communication of limitations, reasoning, and uncertainty.   |
| **Privacy**        | Protection of personal information and data minimization.         |
| **Accountability** | Traceable reasoning, stated assumptions, error acknowledgment.    |
| **Inclusivity**    | Inclusive language, accessibility, cultural awareness.            |
| **User Impact**    | Positive value delivered at the right detail level and tone.      |

For the full definition of each dimension, its score anchors, and worked examples, see [The RAIL Framework](/concepts/rail-framework).

## Basic, deep, and auto modes

```mermaid theme={null}
flowchart TD
    Input["Content + Parameters"] --> Router{"Mode?"}
    Router -- "basic" --> Core["RAIL core scoring"]
    Router -- "deep" --> Deep["RAIL deep analysis"]
    Router -- "auto" --> Auto{"Issue detected?"}
    Auto -- "no" --> Core
    Auto -- "yes" --> Deep
    Core --> Out1["Score + Confidence"]
    Deep --> Out2["Score + Confidence + Explanations + Issues"]
```

All modes score the same 8 dimensions and return the same overall RAIL score. They differ in depth and what detail comes back.

<Tabs>
  <Tab title="Basic mode">
    RAIL's core scoring models. Fast (typically under a second) and built for real-time scoring in production.

    Returns: overall score, per-dimension scores, and confidence values.

    ```python theme={null}
    result = client.eval(content="Your text here", mode="basic")
    # result.rail_score.score       -> 7.6
    # result.dimension_scores       -> {fairness: {score: 7.7, confidence: 0.84}, ...}
    ```
  </Tab>

  <Tab title="Deep mode">
    A deeper, more detailed analysis of the content. Takes a few seconds and, on top of the scores, can return a per-dimension explanation, issue tags, and improvement suggestions.

    ```python theme={null}
    result = client.eval(
        content="Your text here",
        mode="deep",
        include_explanations=True,
        include_issues=True,
        include_suggestions=True,
    )
    # result.dimension_scores["transparency"].explanation -> "The process is mostly clear, but..."
    # result.dimension_scores["safety"].issues            -> ["Potential phishing risks"]
    ```
  </Tab>

  <Tab title="Auto mode">
    Runs **basic** on every request, and escalates to **deep** only when a real issue is detected — a low-scoring or low-confidence dimension, or a flagged signal. Clean content stays fast and cheap; content that needs scrutiny automatically gets the deeper analysis.

    ```python theme={null}
    result = client.eval(content="Your text here", mode="auto")
    # result.resolved_mode -> "basic"  (clean content — stayed fast)
    #                       -> "deep"   (issue detected — escalated)
    # result.escalated      -> False / True
    ```

    `resolved_mode` and `escalated` in the response `result` tell you which tier ran. You're billed at the tier that actually ran.
  </Tab>
</Tabs>

**Which to use:** reach for **basic** when you score on the hot path of a production request and want a fast verdict. Reach for **deep** when you need to show a reviewer *why* something scored low, or when you are debugging and tuning a policy, because it returns explanations and issue tags. Reach for **auto** when you want basic's speed on most traffic but automatic deep analysis on the content that needs it — without deciding upfront.

### The response

Every evaluation returns:

* `rail_score` — the overall score (0–10), its `confidence`, and a one-line `summary`.
* `dimension_scores` — a `score` and `confidence` for each of the 8 dimensions. In deep mode each also carries an `explanation` and `issues` (and `suggestions` when requested).
* `policy_outcome` — how your [application's policy](/concepts/policy-engine) judged the result.

## Selective dimensions

```python theme={null}
result = client.eval(
    content="Your text here",
    mode="basic",
    dimensions=["safety", "privacy", "reliability"],
)
```

## Custom weights

Weights must sum to 100:

```python theme={null}
result = client.eval(
    content="Patient should take 500mg ibuprofen every 4 hours.",
    mode="deep",
    domain="healthcare",
    weights={
        "safety": 25, "privacy": 20, "reliability": 20,
        "accountability": 15, "transparency": 10,
        "fairness": 5, "inclusivity": 3, "user_impact": 2,
    },
)
```

## Score tiers

A score maps to one of five bands, from **Excellent** (9.0–10.0) down to **Critical** (0.0–2.9). See [The RAIL Framework](/concepts/rail-framework#score-tiers) for the full table and what each band means.

## Caching

Identical requests return cached results, so repeated scoring of the same content is fast and not re-charged. Basic mode caches for 5 minutes, deep mode for 3 minutes.

<CardGroup cols={2}>
  <Card title="API Reference: Evaluation" icon="code" href="/api-reference/evaluation">
    Full endpoint specification
  </Card>

  <Card title="Python SDK: Evaluation" icon="python" href="/sdk/python/evaluation">
    Python code examples
  </Card>
</CardGroup>
