Last updated: June 6, 2024 9:21 PM

Key Metrics

TTFT (Time to First Token): This metric calculates the time from the initial request to the reception of the first response in the stream. A lower TTFT is better.

TPS (Tokens Per Second): This measures the total number of tokens a model can generate per second. A higher TPS is preferable.

TRT (Total Response Time): This metric calculates the total time taken from the start to the end of the request. A lower TRT is better.

Models: The specific models used from each provider.

Provider: Supported API providers

Methodology

We make a total of 4 requests per model every hour, ensuring consistent and up-to-date performance data.

Connection Warmup:

A warmup connection is made to eliminate any HTTP connection setup latency. This initial connection is not included in the final calculations to provide a more accurate measure of performance.

Request Types: