Efficient
GenAI inference

Build and scale production applications on the latest optimized models and fine tunes

Try APIs Free

Self-hosted Demo

Text Gen API

LLMs for chat, summarization, and structured output

Media Gen API

Diffusion models for stunning image and video

OctoStack

Turnkey GenAI stack in your environment

Innovators Choose OctoAI

“For our performance and security-sensitive use case, it is imperative that the models that process call data run in an environment that offers flexibility, scale and security. OctoStack lets us easily and efficiently run the customized models we need, within environments that we choose, and deliver the scale our customers require.”

Dali Kaafar

CEO Apate AI

GenAI production stack: SaaS or in your environment

The foundation of OctoAI is systems and compilation technologies we’ve pioneered, like XG Boost, TVM, and MLC, giving you an enterprise system that runs in our SaaS or your private environment.

Diagram of OctoAI GenAI systems stack showing OctoAI's solutions, models, and AI serving stack powered by broad hardware

Enterprise-grade inference

Achieve AI Independence

Free yourself from any single model, model provider, cloud, or hardware setup.

Optimize Performance & Cost

Run GenAI inference at the lowest price and latency on our optimized serving layer.

Future Proof Applications

Rapidly iterate with new models and infrastructure without rearchitecting anything.

Customize Freely

Mix and match models, fine tunes, and AI assets at the model serving layer.

New Solution

OctoStack from OctoAI: GenAI in your environment

OctoStack is a turnkey GenAI serving stack to run your optimized models in your environment on your GPUs. Lower your total cost of ownership and deploy models with greater agility while ensuring data privacy.

Learn more

Overview diagram of how OctoStack by OctoAI would work in your infrastructure environment

What’s New at OctoAI

Customer & Product Updates

Latest Models

Mixtral 8x22B fine-tuned

Chat

Over the coming weeks we will be utilizing the newest and strongest fine-tunes from the community. Come back often to see what new fine-tune will be here for testing. After testing several fine-tuned versions of this model we will select the top performing to persistently host on OctoAI.

Mixtral 8x22B (base)

The most recent mixture-of-experts model from Mistral AI. This is a base model with no additional fine tunes, and is good for testing purposes. Please know that it may repeat or loop back to previous answers since it is not fine-tuned for chat or instruction.

Qwen 1.5

Chat

The beta version of Qwen2, and latest chat model from Alibaba Cloud. This open source model has a top 10 ranking on the ChatBot Arena Leaderboard and is great at conversational use cases in English and Chinese.

Nous Hermes 2 Pro Mistral

Chat

Hermes 2 Pro is an upgraded version of Nous Hermes 2, with newly introduced function calling and JSON mode. This model is great for general conversation, function calling, and JSON output.

See all models

Your choice of models and fine tunes

Start building in minutes. Gain the freedom to run on any model or checkpoint on our efficient API endpoints.

Try APIs Free

Self-hosted Demo

%shell octoai asset create --name checkpoint-panda --upload-from-hf-repo NeuralNovel/Panda-7B-v0.1 \
--engine text/mistral-7b-instruct \
--data-type fp16 \
--format safetensors \
--type checkpoint \
--transfer-api sts

EfficientGenAI inference

Text Gen API

Media Gen API

OctoStack

Innovators Choose OctoAI

GenAI production stack: SaaS or in your environment

Enterprise-grade inference

Achieve AI Independence

Optimize Performance & Cost

Future Proof Applications

Customize Freely

OctoStack from OctoAI: GenAI in your environment

What’s New at OctoAI

Customer & Product Updates

Supercharge RAG Performance Using OctoAI and Unstructured Embeddings

Mixtral 8x22B is now available on OctoAI

OctoAI and Google Cloud Unite to Accelerate Generative AI Innovation

NightCafe Studio now delivering over a million image generation inferences

Latest Models

Mixtral 8x22B fine-tuned

Mixtral 8x22B (base)

Qwen 1.5

Nous Hermes 2 Pro Mistral

Your choice of models and fine tunes

Efficient
GenAI inference