Back to App

EVALUATION RESULTS

DeepEval Test Suite

67 tests across 3 LLM skills and naive vs HyQ RAG comparison. Claude via TritonAI available as the judge LLM.

67
Tests passing
3
Skills evaluated
8/8
HyQ retrievals correct
Loading report...
Run locally: uv sync --group dev && uv run pytest evals/ -v