Last updated: June 6, 2024 9:21 PM

Key Metrics

TTFT (Time to First Token): This metric calculates the time from the initial request to the reception of the first response in the stream. A lower TTFT is better.

TPS (Tokens Per Second): This measures the total number of tokens a model can generate per second. A higher TPS is preferable.

TRT (Total Response Time): This metric calculates the total time taken from the start to the end of the request. A lower TRT is better.

Models: The specific models used from each provider.

Provider: Supported API providers

Methodology

We make a total of 4 requests per model every hour, ensuring consistent and up-to-date performance data.

Connection Warmup:

A warmup connection is made to eliminate any HTTP connection setup latency. This initial connection is not included in the final calculations to provide a more accurate measure of performance.

Request Types:

Short (S):

Input tokens are set to 20, and the maximum output tokens are set to 50.
Medium (M):

Input tokens are set to 30, and the maximum output tokens are set to 100.
Long (L):

Input tokens are set to 1200, and the maximum output tokens are set to 200.
Overall (O):

The overall metric is the average value of the short, medium, and long requests, providing a comprehensive view of the model's performance.