Last updated: June 6, 2024 9:21 PM
TTFT (Time to First Token): This metric calculates the time from the initial request to the reception of the first response in the stream. A lower TTFT is better.
TPS (Tokens Per Second): This measures the total number of tokens a model can generate per second. A higher TPS is preferable.
TRT (Total Response Time): This metric calculates the total time taken from the start to the end of the request. A lower TRT is better.
Models: The specific models used from each provider.
Provider: Supported API providers
We make a total of 4 requests per model every hour, ensuring consistent and up-to-date performance data.
A warmup connection is made to eliminate any HTTP connection setup latency. This initial connection is not included in the final calculations to provide a more accurate measure of performance.
Short (S):
Input tokens are set to 20, and the maximum output tokens are set to 50.
Medium (M):
Input tokens are set to 30, and the maximum output tokens are set to 100.
Long (L):
Input tokens are set to 1200, and the maximum output tokens are set to 200.
Overall (O):
The overall metric is the average value of the short, medium, and long requests, providing a comprehensive view of the model's performance.