Skip to main content

Creating and Managing Benchmarks

Benchmarks in Blast are collections of single-turn test cases that validate your chatbot’s performance on specific prompts. Each benchmark serves as a focused test suite that you can run repeatedly to track improvements and catch regressions.

Creating New Benchmarks

Benchmark Setup

Create a new benchmark from the benchmarks dashboard:
  1. New Benchmark: Click the “New Benchmark” button
  2. Basic Information:
    • Name: Descriptive benchmark name (e.g., “Customer Service Basics”, “Product Search Tests”)
    • Description: Purpose and scope of the benchmark
    • Tags: Optional tags for organization and filtering
  3. Add Test Cases: Add initial test cases using either:
    • Bulk Add Rows: Type multiple test prompts directly
    • CSV Upload: Import test cases from a CSV file
  4. Create: Save the benchmark with your test cases

Benchmark Organization

Use clear naming and tagging strategies to keep benchmarks organized:
  • Functional Names: “Payment Processing”, “Return Policy”, “Product Recommendations”
  • Priority Tags: “critical”, “regression”, “new-features”
  • Domain Tags: “customer-service”, “e-commerce”, “technical-support”

Adding Test Cases

CSV File Upload

Upload multiple test cases at once using a CSV file:

CSV Format Requirements

Your CSV file must contain a single column:
  • prompt: The user input or query to test

Example CSV Format

prompt
"What are your store hours?"
"How do I return a defective item?"
"Do you ship internationally?"
"What payment methods do you accept?"
"I can't find my order confirmation"

Upload Process

  1. File Upload Tab: Select the “File Upload” option when creating or editing a benchmark
  2. Choose File: Select your prepared CSV file
  3. Validation: Blast automatically validates the format
  4. Preview: Review the parsed test cases
  5. Import: Confirm to add all test cases to your benchmark

Manual Test Addition (Bulk Add Rows)

Add multiple test cases manually using the bulk interface:
  1. Bulk Add Rows: Select this option when creating or editing a benchmark
  2. Add Prompts: Type each test prompt in the provided fields
  3. Add More Rows: Click “Add Row” to include additional test cases
  4. Save: Add all prompts to your benchmark

Example Test Cases

Create prompts that represent real user interactions:
prompt
"What's your return policy for electronics?"
"Can I change my shipping address after ordering?"
"Do you price match competitor offers?"
"How long does standard shipping take?"
"What if I receive a damaged product?"
"Can I cancel my order?"
"Do you have a loyalty program?"
"What are your customer service hours?"

Adding Tests from Playground

Database Icon Workflow

Convert playground testing sessions into benchmark test cases:
  1. Playground Testing: Test scenarios manually in the playground
  2. Database Icon: Click the database icon next to any query you want to save
  3. Add to Benchmark Flow:
    • Edit Prompt: Modify the query text if needed
    • Select Benchmark: Choose an existing benchmark or create a new one
    • Add Test: Save the prompt as a new test case

Workflow Example

Playground Session → Identify Valuable Tests → Click Database Icon → 
Select/Create Benchmark → Save as Test Case → Run Benchmark

Adding Tests from Simulation Conversations

Database Icon in Conversations

Convert simulation interactions into focused benchmark tests:
  1. Review Simulations: Analyze completed simulation conversations
  2. Identify Key Interactions: Find specific user inputs worth testing individually
  3. Database Icon: Click the database icon next to the relevant user input
  4. Add to Benchmark:
    • Edit Prompt: Adjust the input text if needed (e.g if there is context missing from the question alone that needs to be added)
    • Select Benchmark: Choose target benchmark or create new one
    • Save Test: Add as a single-turn test case

## Managing Existing Benchmarks

### Benchmark Dashboard
Navigate and manage your benchmark collection:

- **Benchmark List**: View all benchmarks with key metrics
- **Columns**: Name, Description, Number of Rows, Tags, Last Updated, Created

### Individual Benchmark Management
Once inside a specific benchmark:

- **Test List**: View all test cases with their recent results
- **Add Tests**: Add new test cases using any of the methods above
- **Run Tests**: Execute individual tests or the entire benchmark
- **Delete Tests**: Remove outdated or irrelevant test cases
- **Export**: Export test cases


## Next Steps

- [Run your benchmarks](/benchmarks/running-tests) to validate chatbot performance
- [Analyze results](/benchmarks/results-analysis) to identify improvement opportunities
- [Compare performance over time](/benchmarks/results-analysis) to track progress