Skip to content
Documentation

Everything you need to know about RawEval.

From chatting with AI models and flagging failures, to expert evaluation and data delivery — here's how the platform works and how to get started.

Try the Chat →Expert Workbench →
Getting started

How RawEval works — in three steps

01

Chat with AI models

Open the RawEval Chat and start a conversation with any supported model — GPT-4o, Claude, Gemini, and more. Chat naturally, test edge cases, or explore topics in your domain.

Open Chat →
02

Flag what goes wrong

When the AI gives an incorrect, incomplete, or harmful response, click the flag button. Add context about what went wrong — factual error, bias, hallucination, or format issue. Your flag enters the evaluation queue.

03

Experts verify & correct

Domain-verified experts review flagged interactions using structured rubrics. They score the response, write a corrected version, and tag the failure type. All with full provenance metadata.

Learn about experts →
Platform guides

Connect with every part of RawEval

Chat Platform

Multi-model AI chat with built-in failure flagging. Talk to all top frontier models from one interface.

Switch models mid-conversation
One-click failure flagging
Earn payouts for valid flags
See expert corrections in real-time
Open Chat

Expert Workbench

Where verified domain experts evaluate flagged AI failures. Structured rubrics, correction workflows, and quality scoring.

Domain-matched task assignments
Rubric-based evaluation scoring
Correction with full provenance
Tiered earning ($18–$120/task)
Expert Portal

Data Delivery

Audit-ready evaluation data delivered to AI labs and enterprises. Every data point carries full provenance — who evaluated it, their credentials, behavioral verification, and quality scores.

RLHF-ready preference pairs
SFT-ready corrected completions
Rubric scores & improvement deltas
EU AI Act compliance artifacts
Enterprise plans →
API Reference

Programmatic access to evaluation data

For teams that want to integrate evaluation data directly into their ML pipeline. All endpoints require a Bearer token — API keys are scoped per organization.

Authorization: Bearer reval_sk_live_...
GET/v1/batchesList all annotation batches for your organization
GET/v1/batches/:idGet batch details with all annotations and metadata
POST/v1/promptsSubmit prompts for expert evaluation
GET/v1/prompts/:idCheck evaluation status and retrieve results
GET/v1/delivery/:batch_idDownload completed batch in JSONL format

Request

curl -X GET https://api.raweval.com/v1/delivery/batch_42a \
  -H "Authorization: Bearer reval_sk_live_..."

Response

{
  "prompt_id": "fail_0x7f3ac9",
  "model": "gpt-4o",
  "domain": "geography",
  "original_response": "Sydney is the capital...",
  "corrected_response": "Canberra is the capital...",
  "rubric_scores": {
    "accuracy": 9.8,
    "completeness": 9.1,
    "format": 9.5
  },
  "annotator_tier": "T2",
  "improvement_delta": 8.3,
  "provenance": {
    "expert_verified": true,
    "behavioral_check": "passed",
    "eu_ai_act_artifact": "audit_847.json"
  }
}

Frequently asked questions

Do I need an API key to use RawEval?+

No. You can use the Chat and Expert Workbench without an API key. API access is available on Pro and Enterprise plans for programmatic data delivery.

What AI models are supported?+

We support all top frontier AI models and add new ones regularly. You can switch between models within the same conversation to compare responses.

How are experts verified?+

Experts go through credential verification, a domain skills assessment with identity verification, and ongoing quality monitoring to ensure every evaluation meets our standards.

What data formats do you deliver?+

RLHF preference pairs, SFT-ready corrected completions, rubric-scored evaluations, and raw JSONL with full provenance metadata. Custom formats available on Enterprise plans.

How does the Expert Workbench work?+

Experts receive task assignments matched to their domain. They evaluate flagged AI responses using structured rubrics, write corrections, and score responses across multiple dimensions. Everything is tracked with full provenance.

Can I integrate RawEval into my existing pipeline?+

Yes. Use our REST API to submit prompts, retrieve evaluations, and download completed batches. Webhooks notify you when batches complete. Enterprise plans include custom integration support.

Ready to try it?

Chat with AI models, flag what goes wrong, and help build better AI — or bring evaluation data into your ML pipeline.

Try the Chat →Enterprise plansContact us