Rate limits are restrictions applied by OctoAI on the rate at which an individual account can submit inference requests against an API endpoint. It is a mechanism used to ensure predictable performance of the platform, and to allow all OctoAI customers to experience predictable inference latencies. Inference requests that are not completed because of a rate limit cap will return an HTTP 429 response code, and can be retried after an appropriate backoff period.

OctoAI API rate limits

API endpointFree tierPro tierEnterprise tier
Text Gen10 requests per minute240 requests per minuteContact us
Media Gen10 requests per minute60 requests per minuteContact us

Higher rate limits are available, please reach out if you need an increase.