Documentation

Everything you need to know about RawEval.

From chatting with AI models and flagging failures, to expert evaluation and data delivery — here's how the platform works and how to get started.

Try the Chat →Expert Workbench →

Getting started

How RawEval works — in three steps

Chat with AI models

Open the RawEval Chat and start a conversation with any supported model — GPT-4o, Claude, Gemini, and more. Chat naturally, test edge cases, or explore topics in your domain.

Open Chat →

Flag what goes wrong

When the AI gives an incorrect, incomplete, or harmful response, click the flag button. Add context about what went wrong — factual error, bias, hallucination, or format issue. Your flag enters the evaluation queue.

Experts verify & correct

Domain-verified experts review flagged interactions using structured rubrics. They score the response, write a corrected version, and tag the failure type. All with full provenance metadata.

Learn about experts →

Platform guides

Connect with every part of RawEval

Chat Platform

Multi-model AI chat with built-in failure flagging. Talk to all top frontier models from one interface.

✓Switch models mid-conversation

✓One-click failure flagging

✓Earn payouts for valid flags

✓See expert corrections in real-time

Open Chat

Expert Workbench

Where verified domain experts evaluate flagged AI failures. Structured rubrics, correction workflows, and quality scoring.

✓Domain-matched task assignments

✓Rubric-based evaluation scoring

✓Correction with full provenance

✓Tiered earning ($18–$120/task)

Expert Portal

Data Delivery

Audit-ready evaluation data delivered to AI labs and enterprises. Every data point carries full provenance — who evaluated it, their credentials, behavioral verification, and quality scores.

✓RLHF-ready preference pairs

✓SFT-ready corrected completions

✓Rubric scores & improvement deltas

✓EU AI Act compliance artifacts

Enterprise plans →

API Reference

Programmatic access to evaluation data

For teams that want to integrate evaluation data directly into their ML pipeline. All endpoints require a Bearer token — API keys are scoped per organization.

Authorization: Bearer reval_sk_live_...

GET/v1/batchesList all annotation batches for your organization

GET/v1/batches/:idGet batch details with all annotations and metadata

POST/v1/promptsSubmit prompts for expert evaluation

GET/v1/prompts/:idCheck evaluation status and retrieve results

GET/v1/delivery/:batch_idDownload completed batch in JSONL format

Request

curl -X GET https://api.raweval.com/v1/delivery/batch_42a \
  -H "Authorization: Bearer reval_sk_live_..."

Response

{
  "prompt_id": "fail_0x7f3ac9",
  "model": "gpt-4o",
  "domain": "geography",
  "original_response": "Sydney is the capital...",
  "corrected_response": "Canberra is the capital...",
  "rubric_scores": {
    "accuracy": 9.8,
    "completeness": 9.1,
    "format": 9.5
  },
  "annotator_tier": "T2",
  "improvement_delta": 8.3,
  "provenance": {
    "expert_verified": true,
    "behavioral_check": "passed",
    "eu_ai_act_artifact": "audit_847.json"
  }
}

Frequently asked questions

Do I need an API key to use RawEval?+

No. You can use the Chat and Expert Workbench without an API key. API access is available on Pro and Enterprise plans for programmatic data delivery.

What AI models are supported?+

We support all top frontier AI models and add new ones regularly. You can switch between models within the same conversation to compare responses.

How are experts verified?+

Experts go through credential verification, a domain skills assessment with identity verification, and ongoing quality monitoring to ensure every evaluation meets our standards.

What data formats do you deliver?+

RLHF preference pairs, SFT-ready corrected completions, rubric-scored evaluations, and raw JSONL with full provenance metadata. Custom formats available on Enterprise plans.

How does the Expert Workbench work?+

Experts receive task assignments matched to their domain. They evaluate flagged AI responses using structured rubrics, write corrections, and score responses across multiple dimensions. Everything is tracked with full provenance.

Can I integrate RawEval into my existing pipeline?+

Yes. Use our REST API to submit prompts, retrieve evaluations, and download completed batches. Webhooks notify you when batches complete. Enterprise plans include custom integration support.

Ready to try it?

Chat with AI models, flag what goes wrong, and help build better AI — or bring evaluation data into your ML pipeline.

Try the Chat →Enterprise plans Contact us