Step 1 of 4

Data Capture

Identifying where AI models fail in real-world usage

Real-time capture

Every failed response is a training opportunity

Traditional AI training relies on static datasets. RawEval captures live user interactions where models actually fail, creating a continuous feedback loop that identifies exactly where AI needs improvement.

Zero-latency capture (no slowdown for users)

Automatic PII removal before storage

Failure detection via user behavior signals

User Query

"Explain quantum tunneling in simple terms"

Model Response

Generic answer, user edits for 15 seconds...

Captured for Evaluation

Queued for expert correction

How capture works

Dual-path execution

When a user submits a query, we create two parallel paths:

Path A:Fast response to user (standard flow)

Path B:Silent copy to evaluation buffer

Failure detection

We detect model failures through user behavior signals:

• User spends >10s editing response

• Clicks "Wrong" or "Try again"

• Abandons without accepting answer

• Submits follow-up clarification

Priority queueing

Failed prompts are automatically prioritized based on:

• Domain complexity

• Confidence score of failure

• Enterprise client demand

• Model improvement potential

Technical implementation

What we capture

Original Prompt:Full user query with context

Model Response:Complete AI-generated answer

Web Context:RAG sources and citations

User Edits:Changes made by the user

Privacy & security

Automatic PII removal

Email addresses, phone numbers, and personal identifiers stripped before storage

End-to-end encryption

All captured data encrypted at rest and in transit

Zero user impact

Capture happens asynchronously with <5ms overhead

47K+

Queries captured daily

4.3%

Average failure rate

<5ms

Capture overhead

100%

Privacy compliant

Why real-world capture matters

Traditional approach

✗ Training on static benchmark datasets
✗ No visibility into real user pain points
✗ Weeks/months between failure and fix
✗ Biased toward academic test cases

RawEval approach

✓ Live capture of actual model failures
✓ Identify problems as they happen
✓ Hours from failure to training data
✓ Reflects real-world use cases