Overview

Purpose

Simulations provide automated, multi-turn testing of your chatbot. We have enabled 6 different simulation types to help you easily test anything that you might want to test.

Available Simulators

Goal Simulator

Tests goal-oriented conversations. Configure specific goals or goal topics, plus user personas.

Topic Simulator

Explores query space around topics. Configure topics, max turns, and user personas.

Safety Simulator

Tests for unsafe responses. Configure max turns to control conversation length.

Follow-up Follower Simulator

Tests follow-up question pathways. Start with initial questions or question topics.

Product Recommendations Simulator

Tests product finding capabilities. Configure categories, deep search, and max turns.

Product Category Simulator

Tests category identification. Configure initial queries/topics and personas.

Launching Simulations

New Simulator Page Workflow

When you click “New Simulation,” you’ll be setting up a batch of automated conversations. Each batch can run up to 100 simulations in parallel, with each simulation taking approximately 40 seconds to complete. Here’s how to set it up:

Select the Simulator Type: Choose from one of the available simulator types (e.g., Goal Simulator, Topic Simulator, etc.).
Configure Parameters: Set up the parameters for your simulation. For example, you can specify product categories, personas, max turns, and other relevant settings.
Set Execution Runs: Determine the number of runs for each parameter combination. The total number of simulations will be the product of all your parameter combinations multiplied by the number of execution runs.
Launch Simulation: Once configured, launch your batch of simulations. They will run in parallel for efficiency.
Monitor on Dashboard: As the simulations run, you can monitor their progress on a dashboard, which provides real-time updates on the status of each simulation.

Parameter Combinations

The system will automatically create simulations for every possible combination of your parameters. Each simulator type has its own specific parameters, but there are several common parameters shared across most simulators: Common Parameters:

Max Turns: Maximum number of conversation turns (default: 10). This limits how long each simulated conversation can go.
Personas: Simulated personalities that affect conversation style. Personas help test how your chatbot handles different types of users:
- Different conversation styles and communication preferences
- Varied expertise levels (e.g., beginner vs professional)
- Language preferences (you can specify language in persona descriptions)
Execution Runs: Number of times to run each parameter combination

Here are two examples showing both shared and simulator-specific parameters: Product Recommendations Simulator Example:

Simulator: Product Recommendations Simulator
Parameters:
- Product Categories: ["Power Tools", "Paint & Stain"]     // 2 values
- Deep Search: False                                       // 1 value (only 1 value allowed for entire batch)
- Max Turns: 4                                            // 1 value (only 1 value allowed for entire batch)
- Execution Runs: 2                                       // 2 runs each
Total Simulations: 2 × 1 × 1 × 2 = 4 simulations

Topic Simulator Example:

Simulator: Topic Simulator
Parameters:
- Topics: ["troubleshooting", "product_info", "billing"]   // 3 values
- Personas: ["technical", "casual", "frustrated"]          // 3 values
- Max Turns: [5]                                          // 1 value (only 1 value allowed for entire batch)
- Execution Runs: 3                                       // 3 runs each
Total Simulations: 3 × 3 × 1 × 3 = 27 simulations

Evaluation System

Available Evaluators

Select from 6 specialized evaluators to assess different aspects:

Follow-up Refusal Detector: Ensures appropriate follow-up suggestions
Language Detection: Validates that input and output languages are the same
Product Relevance: Checks that recommended products are relevant
Product Specs Contradiction: Detects information inconsistencies between chatbot’s
Response Style: Evaluates stylistic elements of the response
Search Term Relevance: Ensures recommended search terms/product categories align with user needs

Results Analysis

Understanding and using simulation results

Quick Start Guide

Get started with your first simulation

Getting Started

Simulations

Benchmarks

Purpose

Available Simulators

Goal Simulator

Topic Simulator

Safety Simulator

Follow-up Follower Simulator

Product Recommendations Simulator

Product Category Simulator

Launching Simulations

New Simulator Page Workflow

Parameter Combinations

Evaluation System

Available Evaluators

Results Analysis

Quick Start Guide

Getting Started

Simulations

Benchmarks

​Purpose

​Available Simulators

Goal Simulator

Topic Simulator

Safety Simulator

Follow-up Follower Simulator

Product Recommendations Simulator

Product Category Simulator

​Launching Simulations

​New Simulator Page Workflow

​Parameter Combinations

​Evaluation System

​Available Evaluators

Results Analysis

Quick Start Guide

Purpose

Available Simulators

Launching Simulations

New Simulator Page Workflow

Parameter Combinations

Evaluation System

Available Evaluators