Creating a Performance Benchmark

Start a New Benchmark

Go to Benchmarking → Create.
Choose Performance as the benchmark type.

Select LLM as the model type.

General Information

Benchmark Name — Give the run a clear, unique name.
Select Deployments — Pick one or more deployments to benchmark.
Region — (Shown in Available Regions) choose where the run executes.

Supported Inputs (Chat Dataset Input)

Dataset Name — Choose a dataset (currently we support only Ultrachat dataset).
Num Samples — Limit how many rows/samples from the dataset to use.
Sampling Temperature — Set generation randomness between 0 and 1.
Maximum tokens to generate — Sets the upper limit on response length in tokens.
Whether to stream responses — Toggle on/off to stream model outputs during the run.

Execution Configuration

Users — Number of virtual users to simulate during the benchmark.
Duration — How long the benchmark should run (e.g., 30s).
Load Type — Pattern for load generation (e.g., Constant).
Metrics — Select the metrics to capture (e.g., throughput, ttft, tpot).

Search Space (Hyperparameter Sweep)

Search Space allows you to run the same benchmark multiple times with different parameter values, so you can compare how deployments behave under varied conditions. Instead of running separate benchmarks manually, you can provide multiple values and the suite will automatically generate runs for each combination.

inputs[name=messages].preprocess.input_tokens — Vary the input size (number of tokens in the prompt).
Example: 1000, 2000, 3000.
inputs[name=max_tokens].value — Vary the maximum output length.
Example: 100, 200, 300.
execution.profile.users — Vary the number of concurrent users.
Example: 1, 2, 3.

Example:
If you specify input_tokens: [1000,2000] and users: [1,2], the suite will generate 4 runs:

1000×1, 1000×2, 2000×1, 2000×2.

Type a value and press Enter or Add — values appear as tags. Leave blank to skip.

Run the Benchmark

Click Create Benchmark. The run is queued and executed in the selected Region with the chosen metrics and sweep settings.

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

Performance Benchmarking

Creating a Performance Benchmark

Start a New Benchmark

General Information

Supported Inputs (Chat Dataset Input)

Execution Configuration

Search Space (Hyperparameter Sweep)

Run the Benchmark

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

​Creating a Performance Benchmark

​Start a New Benchmark

​General Information

​Supported Inputs (Chat Dataset Input)

​Execution Configuration

​Search Space (Hyperparameter Sweep)

​Run the Benchmark

Creating a Performance Benchmark

Start a New Benchmark

General Information

Supported Inputs (Chat Dataset Input)

Execution Configuration

Search Space (Hyperparameter Sweep)

Run the Benchmark