EVALUATION RESULTS
DeepEval Test Suite
67 tests across 3 LLM skills and naive vs HyQ RAG comparison. Claude via TritonAI available as the judge LLM.
67
Tests passing
3
Skills evaluated
8/8
HyQ retrievals correct
Loading report...
EVALUATION RESULTS
67 tests across 3 LLM skills and naive vs HyQ RAG comparison. Claude via TritonAI available as the judge LLM.