Real-time capture
Every failed response is a training opportunity
Traditional AI training relies on static datasets. RawEval captures live user interactions where models actually fail, creating a continuous feedback loop that identifies exactly where AI needs improvement.
Zero-latency capture (no slowdown for users)
Automatic PII removal before storage
Failure detection via user behavior signals
User Query
"Explain quantum tunneling in simple terms"
Model Response
Generic answer, user edits for 15 seconds...
Captured for Evaluation
Queued for expert correction
How capture works
Dual-path execution
When a user submits a query, we create two parallel paths:
Path A:Fast response to user (standard flow)
Path B:Silent copy to evaluation buffer
Failure detection
We detect model failures through user behavior signals:
• User spends >10s editing response
• Clicks "Wrong" or "Try again"
• Abandons without accepting answer
• Submits follow-up clarification
Priority queueing
Failed prompts are automatically prioritized based on:
• Domain complexity
• Confidence score of failure
• Enterprise client demand
• Model improvement potential
Technical implementation
What we capture
Original Prompt:Full user query with context
Model Response:Complete AI-generated answer
Web Context:RAG sources and citations
User Edits:Changes made by the user
Privacy & security
Automatic PII removal
Email addresses, phone numbers, and personal identifiers stripped before storage
End-to-end encryption
All captured data encrypted at rest and in transit
Zero user impact
Capture happens asynchronously with <5ms overhead
47K+
Queries captured daily
4.3%
Average failure rate
<5ms
Capture overhead
100%
Privacy compliant
Why real-world capture matters
Traditional approach
- ✗ Training on static benchmark datasets
- ✗ No visibility into real user pain points
- ✗ Weeks/months between failure and fix
- ✗ Biased toward academic test cases
RawEval approach
- ✓ Live capture of actual model failures
- ✓ Identify problems as they happen
- ✓ Hours from failure to training data
- ✓ Reflects real-world use cases