Skip to main contentPurpose
Benchmarks provide deterministic, single-turn testing for your chatbot, complementing the multi-turn simulation testing offered by Blast’s simulation engine. While simulations explore conversational flows and uncover new and previously unknown issues, benchmarks focus on testing specific, predetermined inputs to avoid regressions.
Single-Turn Testing Concept
Benchmarks test individual question-answer pairs in isolation. Each test case consists of:
- Input: A specific user query or message
- Evaluation: Automated scoring against defined criteria
This approach allows you to validate specific functionality, edge cases, and compliance requirements with consistent, repeatable results.
Integration Strategy
The most effective testing approach combines both methods:
- Simulations for discovery and comprehensive testing
- Benchmarks for validation and regression testing
- Playground for manual verification and test case creation
Use simulation results to identify specific scenarios worth adding to your benchmark suite, creating a comprehensive testing strategy that covers both exploration and validation.
Next Steps