Skip to main content

Overview

After running simulations, you can view and analyze results, identify issues, and rerun simulations as needed.

Results Dashboard

The dashboard shows your simulations in a table format with key information:
  • Simulation type and parameters used
  • Number of conversation turns
  • Pass/fail status for each evaluator
  • Timestamps and duration
You can select multiple simulations to rerun them in bulk or delete them all at once.

Understanding Results

Conversation Review

Click on any simulation to see:
  • The full conversation transcript
  • Which evaluators passed or failed
  • Specific feedback from evaluators about what why they marked a specific conversation turn as a success or failure

Evaluation Scores

Each simulation is evaluated across multiple dimensions:
  • 6/6 pass: All evaluators passed
  • 4/6 pass: 4 evaluators passed, 2 failed
  • 2/6 pass: Significant issues detected
When evaluators fail, they provide specific feedback about what went wrong. For example:
Product Relevance Evaluator: FAIL
Feedback: "Recommended shelving units instead of hammers"

Rerunning Simulations

How Reruns Work

You can rerun any simulation or set of simulations. When you rerun:
  • The same configuration parameters are used
  • A completely new conversation is generated
  • Results may be different due to the dynamic nature of conversations
Example: Original simulation:
Parameters:
- Topic: "Kitchen renovation"
- Persona: "DIY beginner"
- Max Turns: 5

Conversation: Questions about cabinet installation
After rerun (same parameters, different conversation):
Parameters:
- Topic: "Kitchen renovation"
- Persona: "DIY beginner"
- Max Turns: 5

Conversation: Questions about countertop materials

Finding Specific Results

You can filter and search results by:
  • Simulation type
  • Pass/fail status
  • Date range
  • Specific parameters used
This helps you focus on particular areas of interest or track down specific issues you’re investigating.