Sign up
Log in
Sign up
Log in
Launch Promo Pricing

Only pay for what you use

Get started today on OctoAI and receive $10 of free credit in your account.


  • $10 of free credit upon sign up

  • Stable Diffusion XL (SDXL)

  • Stable Diffusion 1.5 (SD 1.5)

  • Access to 10 pre-loaded fine tuning assets

Sign Up



per base SDXL image


Per custom SDXL image


Per SD 1.5 image (base or custom)

  • Import and use custom fine tuning assets through the Asset Library

  • Create your own fine tuning assets

  • ControlNets, inpainting, outpainting, LCM-LoRA

Sign Up


  • Everything in Pro, plus:

  • Committed use discounts

  • No rate limits

  • Dedicated Customer Success Manager

  • Contractual SLAs

  • Option for private deployment

Contact Us
white GenAI icon

What is the OctoAI Image Gen Solution?

OctoAI Image Gen Solution is an end-to-end solution that allows app builders to easily customize (fine tune) Stable Diffusion and Stable Diffusion XL (SDXL) to their needs as well as seamlessly scale usage with no impact to image generation speed or quality.

white default configs icon

What are default configurations?

The pricing listed here is based on the following default configurations for SDXL and Stable Diffusion 1.5. A default SDXL image is defined as a 30-step, 1024x1024 image with the K_DPMPP_2M sampler. A default Stable Diffusion 1.5 image is defined as a 30-step, 512x512 image with the DPMSolverMultistep scheduler. Please refer to the detailed pricing in the OctoAI docs, for other configurations.

Text Gen Solution pricing

Receive $10 of free credit in your account upon sign up.


  • $10 of free credit upon sign up

  • Web demo UI

  • Llama 2 Chat

  • Code Llama Instruct

  • Mistral Instruct

Sign up



per 1K tokens* for 7B models


per 1K tokens* for Code Llama 34B


per 1K tokens* for Llama 2 70B

*Typical 4:1 input to output ratio, and full precision

  • Quantized and full precision weights

Sign Up


  • Everything in Pro, plus:

  • Run your choice of checkpoints

  • Committed use discounts

  • Dedicated Customer Success Manager

  • Contractual SLAs

  • Performance/latency optimization options

  • Options for private deployments

Sign up
white OctoAI Text Gen solution icon

What is the OctoAI Text Gen Solution?

OctoAI Text Gen Solution provides developers a unified API endpoint to build on their choice of open source large language models and variants. Current models supported include Llama 2 Chat, Code Llama Instruct, and Mistral Instruct. Customers can run inferences with 7B, 13B, 34B and 70B sizes, and quantized (INT4) and full precision (FP16) weights, all on a single unified API endpoint.

LLM token chat bubble white icon

What are input and output tokens?

Tokens are units used to measure input and output text for LLMs. 1,000 tokens is about 750 words. Input tokens measure tokens in the input prompt (including contextual information). Output tokens are generated by the model. The input to output token ratio varies, and increases for use cases where more context is needed. A typical Retrieval Augmented Generation (RAG) implementation would see input to output ratios starting at ~ 4:1.

Compute Service Pricing

Receive $10 in free credits to explore the possibilities of OctoAI. Bring your own model, or start building your AI app immediately with one of our ready to deploy models.

pricing small icon


$0.40 per hour

(0.011¢ per second)


pricing medium icon


$1.15 per hour

(0.032¢ per second)


pricing large40 icon


$4.10 per hour

(0.114¢ per second)


pricing slarge80 icon


$5.20 per hour

(0.145¢ per second)



Frequently Asked Questions

Don't see the answer to your question here? Feel free to reach out so we can help.

What is the OctoAI compute service?

OctoAI is a compute service to run, tune (or customize), and scale your generative AI models. OctoAI lets you easily get started with ready to use model templates, or convert any container or Python code into a production-grade endpoint, within minutes. You can then easily build against the endpoint within your web or mobile apps.

How can I get the SDXL launch promotion* pricing?

The first step in getting OctoAI's launch promotion* pricing for base SDXL images is signing up for a free guided trial. Then after successful completion, committing to a minimum of 10 million images generated per year.

How does billing work?

At sign up, you get $10 of free credit, which can be used until the end of the first month. You can enter your credit card at any time, and your account will be automatically charged to keep your credit replenished. This will be a minimum of $10 or a maximum amount that you set. We will auto-reload your account when the balance reaches 10% of your reload amount. Your account must have a positive balance for you to use the service.

What is model acceleration?

OctoAI accelerates foundation models like Stable Diffusion 1.5 and SDXL, reducing latency and thus delivering a more engaging user experience in your applications for your users. Through the OctoAI Image Gen Solution, your inference calls benefit from the lower latencies and accelerated models even when you choose to create images with your selected imported or bespoke fine tuning assets.

What GPUs are available?

At launch, OctoAI will support the NVIDIA T4, the NVIDIA A10G, and the NVIDIA A100. Additional GPU targets will be added in future.

What about privacy and security?

OctoML is SOC 2 Type II certified. Keeping our customers’ data private and secure is a top priority, and we have internal systems to ensure appropriate handling of customer data. The SOC 2 Type II certification provides independent validation of these processes and safeguards. We do not persist the inputs, outputs, nor intermediate computations of your inferences, except for runtime logs that you choose to expose in your container. For encryption in transit, we ensure that all connections from customer to the OctoAI compute service require TLS, without you having to manage TLS certifications yourself. We also use encryption at rest for any data that we write to disk.

Do you offer enterprise pricing? What are the additional features?
Contact us for this tier of pricing. Additional features include: inferences in your private environment, experts available to accelerate your model, reserved hardware, priority support, and additional users in your account.
What are cold starts?

When a new replica is started, it takes a certain amount of time for the compute infrastructure to be provisioned, data to be loaded, and the service to be ready for an inference query. This is known as a cold start. You are not billed for the cold start time.

What happens when I’m not using my endpoint?

When an endpoint does not receive any inference query for a preset timeout period, it will terminate running replicas until it reaches the count set for minimum replicas. If the minimum replicas is set to 0, all running replicas will be terminated and you will not be billed when the endpoint is not in use. You can set the timeout, minimum replicas, and maximum replicas for your endpoints to meet your application architecture and requirements.

What kind of access control is available for accessing the endpoints?

You can choose to keep your endpoints public or private. Private endpoints can be accessed by using the API token(s) generated within OctoAI.