Sign up
Log in
Sign up
Log in

Run Tune Scale generative models in the cloud

OctoAI is world-class compute infrastructure for tuning and running models that wow your users.

Sign Up

Fast, efficient model endpoints and the freedom to run any model


Develop with any model

Leverage OctoAI’s accelerated models or bring your own from anywhere


Run with ease

Create ergonomic model endpoints in minutes, with only a few lines of code


Fine-tune freely

Customize your model to fit any use case that serves your users

Sign up to Fine-Tune Stable Diffusion

Scale efficiently

Go from zero to millions of users, never worrying about hardware, speed, or cost overruns

curl -X POST' \

-H 'content-type: application/json' \

-H 'Authorization: BEARER {apiKey}' \

--data '{"prompt:"an oil painting of an octopus

playing chess", "width":512, "height":512, 

"guidance_scale":7.5, "num_images_per_prompt":1, 

"num_inference_steps":3-, "seed":0, 


connecting dots icon

Develop with models optimized for speed and cost

Tap into our curated list of best-in-class open-source foundation models that we’ve made faster and cheaper to run using our deep experience in machine learning compilation, acceleration techniques, and proprietary model-hardware performance technology.

model hardware icon

Stable Diffusion 1.5

A highly tuned model that generates photo-realistic images based on text input. World’s fastest Stable Diffusion 1.5.

code search icon

Llama 2 Chat

An instruction-tuned large language model for chatbots and chat completions.

whisper icon

Whisper X

A general-purpose speech transcription model turning audio speech into text. It is trained on a large diverse dataset of audio.

more models icon

More Models

  • Bring Your Own
  • Stable Diffusion XL

  • LLama 2 13B

  • CLIP

  • ControlNet

Your fine-tuned model running on self-optimizing compute

OctoAI automatically selects the optimal hardware target, applies the latest optimization technologies, and always keeps your running models in an optimal manner.

Our ML experts deliver the fastest, cheapest foundational models

The OctoML team includes recognized leaders in ML systems, ML compilation, hardware intrinsics who have founded widely adopted open source ML projects including: Apache TVM and XGBoost. Our accelerated models are in production at hyperscalers like Microsoft where they process billions of images a month in services like Xbox.

curl -X POST' \
-H 'content-type: application/json' \
-H 'Authorization: BEARER {apiKey}' \
--data '{"prompt:"an oil painting of an octopus playing chess", "width":512, "height":512, "guidance_scale":7.5, "num_images_per_prompt":1, "num_inference_steps":3-, "seed":0, "negative_prompt":"frog", "solver":"DPMSolverMultistep"}'
 > test_curl )json.out

Stable Diffusion (accelerated)


faster than baseline


images generated for $1

curl "" \ -H "accept: text/event-stream" \ -H "authorization: Bearer $YOUR_TOKEN" \ -H "content-type: application/json" \ -d '{ "model": "llama-2-70b-chat", "messages": [ { "role": "assistant","content": "Below is an instruction that describes a task. Write a response that appropriately completes the request." }, { "role": "user", "content": "write a poem about an octopus who lives in the sea"}], "stream": true, "max_tokens": 850}'

Llama 2 70B (accelerated)


performance gains on multi-GPU

over 3x

savings on cost

Start building with ease in minutes using OctoAI

Our mission is empowering developers to build AI applications that delight users by leveraging fast models running on the most efficient hardware. Sign up and start building in minutes.

Sign Up