API & AI Rate Limiting
Understand per-tenant rate limits, token budgets, and API concurrency restrictions.
API & AI Rate Limiting Architecture
To maintain system stability, security, and fair resource allocation, Zelosify enforces a robust, multi-dimensional rate limiting system on all tenants (TenantRateLimit database profile).
Default Limit Pools
- Request Volume Limits:
- Requests per Minute: 100 requests (burst size allowance up to 200).
- Requests per Hour: 5,000 requests.
- Requests per Day: 100,000 requests.
- AI Token Budgets (per hour):
- Input Tokens: 500,000 tokens per hour.
- Output Tokens: 250,000 tokens per hour.
- Embeddings: 10,000 embedding operations per day.
Exceeding Limits (HTTP 429)
When an organization surpasses any of these configured limits, the platform returns an HTTP 429 Too Many Requests error code with a JSON payload detailing the violation type (e.g., request rate, token rate, or cost threshold exceeded). System administrators receive alerts when a tenant approaches 80% and 100% of their limit budgets.
Requesting Limit Adjustments
For enterprise workloads requiring higher throughput or custom daily budgets, administrators can contact support to configure custom rate limits. These overrides are updated dynamically at the database level and applied immediately without service interruption.