Only pay for what you use
Get started today on OctoAI and receive $10 of free credit in your account.

Free
$10 of free credit upon sign up
Stable Diffusion XL (SDXL)
Stable Diffusion 1.5 (SD 1.5)
Access to 10 pre-loaded fine tuning assets
Pro
$0.004
per base SDXL image
$0.008
Per custom SDXL image
$0.0015
Per SD 1.5 image (base or custom)
Everything in the free, plus
Import and use custom fine tuning assets through the Asset Library
Create your own fine tuning assets
ControlNets, inpainting, outpainting, LCM-LoRA
Enterprise
Everything in Pro, plus:
Committed use discounts
No rate limits
Dedicated Customer Success Manager
Contractual SLAs
Option for private deployment
What is the OctoAI Image Gen Solution?
OctoAI Image Gen Solution is an end-to-end solution that allows app builders to easily customize (fine tune) Stable Diffusion and Stable Diffusion XL (SDXL) to their needs as well as seamlessly scale usage with no impact to image generation speed or quality.
What are default configurations?
The pricing listed here is based on the following default configurations for SDXL and Stable Diffusion 1.5. A default SDXL image is defined as a 30-step, 1024x1024 image with the K_DPMPP_2M sampler. A default Stable Diffusion 1.5 image is defined as a 30-step, 512x512 image with the DPMSolverMultistep scheduler. Please refer to the detailed pricing in the OctoAI docs, for other configurations.
Text Gen Solution pricing
Receive $10 of free credit in your account upon sign up.

Free
$10 of free credit upon sign up
Web demo UI
Llama 2 Chat
Code Llama Instruct
Mistral Instruct
Pro
$0.00018
per 1K tokens* for 7B models
$0.00063
per 1K tokens* for Code Llama 34B
$0.00086
per 1K tokens* for Llama 2 70B
*Typical 4:1 input to output ratio, and full precision
Everything in the free, plus
Quantized and full precision weights
Enterprise
Everything in Pro, plus:
Run your choice of checkpoints
Committed use discounts
Dedicated Customer Success Manager
Contractual SLAs
Performance/latency optimization options
Options for private deployments
What is the OctoAI Text Gen Solution?
OctoAI Text Gen Solution provides developers a unified API endpoint to build on their choice of open source large language models and variants. Current models supported include Llama 2 Chat, Code Llama Instruct, and Mistral Instruct. Customers can run inferences with 7B, 13B, 34B and 70B sizes, and quantized (INT4) and full precision (FP16) weights, all on a single unified API endpoint.
What are input and output tokens?
Tokens are units used to measure input and output text for LLMs. 1,000 tokens is about 750 words. Input tokens measure tokens in the input prompt (including contextual information). Output tokens are generated by the model. The input to output token ratio varies, and increases for use cases where more context is needed. A typical Retrieval Augmented Generation (RAG) implementation would see input to output ratios starting at ~ 4:1.
Compute Service Pricing
Receive $10 in free credits to explore the possibilities of OctoAI. Bring your own model, or start building your AI app immediately with one of our ready to deploy models.
Frequently Asked Questions
Don't see the answer to your question here? Feel free to reach out so we can help.
OctoAI is a compute service to run, tune (or customize), and scale your generative AI models. OctoAI lets you easily get started with ready to use model templates, or convert any container or Python code into a production-grade endpoint, within minutes. You can then easily build against the endpoint within your web or mobile apps.
The first step in getting OctoAI's launch promotion* pricing for base SDXL images is signing up for a free guided trial. Then after successful completion, committing to a minimum of 10 million images generated per year.
At sign up, you get $10 of free credit, which can be used until the end of the first month. You can enter your credit card at any time, and your account will be automatically charged to keep your credit replenished. This will be a minimum of $10 or a maximum amount that you set. We will auto-reload your account when the balance reaches 10% of your reload amount. Your account must have a positive balance for you to use the service.
OctoAI accelerates foundation models like Stable Diffusion 1.5 and SDXL, reducing latency and thus delivering a more engaging user experience in your applications for your users. Through the OctoAI Image Gen Solution, your inference calls benefit from the lower latencies and accelerated models even when you choose to create images with your selected imported or bespoke fine tuning assets.
At launch, OctoAI will support the NVIDIA T4, the NVIDIA A10G, and the NVIDIA A100. Additional GPU targets will be added in future.
OctoML is SOC 2 Type II certified. Keeping our customers’ data private and secure is a top priority, and we have internal systems to ensure appropriate handling of customer data. The SOC 2 Type II certification provides independent validation of these processes and safeguards. We do not persist the inputs, outputs, nor intermediate computations of your inferences, except for runtime logs that you choose to expose in your container. For encryption in transit, we ensure that all connections from customer to the OctoAI compute service require TLS, without you having to manage TLS certifications yourself. We also use encryption at rest for any data that we write to disk.
When a new replica is started, it takes a certain amount of time for the compute infrastructure to be provisioned, data to be loaded, and the service to be ready for an inference query. This is known as a cold start. You are not billed for the cold start time.
When an endpoint does not receive any inference query for a preset timeout period, it will terminate running replicas until it reaches the count set for minimum replicas. If the minimum replicas is set to 0, all running replicas will be terminated and you will not be billed when the endpoint is not in use. You can set the timeout, minimum replicas, and maximum replicas for your endpoints to meet your application architecture and requirements.
You can choose to keep your endpoints public or private. Private endpoints can be accessed by using the API token(s) generated within OctoAI.