Run Tune Scale generative models in the cloud
OctoAI is world-class compute infrastructure for tuning and running models that wow your users.
Fast, efficient model endpoints and the freedom to run any model
Develop with any model
Leverage OctoAI’s accelerated models or bring your own from anywhere
Run with ease
Create ergonomic model endpoints in minutes, with only a few lines of code
Fine-tune freely
Customize your model to fit any use case that serves your users
Scale efficiently
Go from zero to millions of users, never worrying about hardware, speed, or cost overruns
curl -X POST https://your-sd-endpoint.octoai.cloud/predict' \
-H 'content-type: application/json' \
-H 'Authorization: BEARER {apiKey}' \
--data '{"prompt:"an oil painting of an octopus
playing chess", "width":512, "height":512,
"guidance_scale":7.5, "num_images_per_prompt":1,
"num_inference_steps":3-, "seed":0,
"negative_prompt":"frog",
"solver":"DPMSolverMultistep"}'

Develop with models optimized for speed and cost
Tap into our curated list of best-in-class open-source foundation models that we’ve made faster and cheaper to run using our deep experience in machine learning compilation, acceleration techniques, and proprietary model-hardware performance technology.

Stable Diffusion 1.5
A highly tuned model that generates photo-realistic images based on text input. World’s fastest Stable Diffusion 1.5.

Llama 2 Chat
An instruction-tuned large language model for chatbots and chat completions.

Whisper X
A general-purpose speech transcription model turning audio speech into text. It is trained on a large diverse dataset of audio.

More Models
- Bring Your Own
Stable Diffusion XL
LLama 2 13B
CLIP
ControlNet
Your fine-tuned model running on self-optimizing compute
OctoAI automatically selects the optimal hardware target, applies the latest optimization technologies, and always keeps your running models in an optimal manner.

Bring any model, fine-tune for anything
Create your own endpoint with custom models you’ve created, our optimized models, or ones you’ve found on other platforms. You can use your data to tailor the model for your use case.

Self-optimizing compute for scale
OctoAI puts ML optimization in the hands of developers. The compute service optimizes your models programmatically using state-of-the-art acceleration and compilation techniques, then selects the best model-hardware combination.
Our ML experts deliver the fastest, cheapest foundational models
The OctoML team includes recognized leaders in ML systems, ML compilation, hardware intrinsics who have founded widely adopted open source ML projects including: Apache TVM and XGBoost. Our accelerated models are in production at hyperscalers like Microsoft where they process billions of images a month in services like Xbox.
curl -X POST https://your-sd-endpoint.octoai.cloud/predict' \ -H 'content-type: application/json' \ -H 'Authorization: BEARER {apiKey}' \ --data '{"prompt:"an oil painting of an octopus playing chess", "width":512, "height":512, "guidance_scale":7.5, "num_images_per_prompt":1, "num_inference_steps":3-, "seed":0, "negative_prompt":"frog", "solver":"DPMSolverMultistep"}' > test_curl )json.out
Stable Diffusion (accelerated)
3x
faster than baseline model
~1,000
images generated for $1
curl "https://my-llama-2-70b-chat-demo.octoai.run/chat/completions" \ -H "accept: text/event-stream" \ -H "authorization: Bearer $YOUR_TOKEN" \ -H "content-type: application/json" \ -d '{ "model": "llama-2-70b-chat", "messages": [ { "role": "assistant","content": "Below is an instruction that describes a task. Write a response that appropriately completes the request." }, { "role": "user", "content": "write a poem about an octopus who lives in the sea"}], "stream": true, "max_tokens": 850}'
Llama 2 70B (accelerated)
2x
performance gains on multi-GPU
over 3x
savings on cost
Start building with ease in minutes using OctoAI
Our mission is empowering developers to build AI applications that delight users by leveraging fast models running on the most efficient hardware. Sign up and start building in minutes.
