Sign up
Log in
Sign up
Log in

OctoML and AWS Better Together

OctoAI is a compute service to run, tune, and scale your generative AI models, built on top of AWS. It allows developers to quickly and cost effectively take generative AI applications to production on AWS.

AWS Partner Network logo

Benefits of OctoAI, powered by AWS

OctoAI complements the AWS core infrastructure offerings to ensure models are run in a hardware configuration that is optimized for the model and for the application. OctoAI's model acceleration reduces latency and cost for popular foundation models including Stable Diffusion, Whisper, LLaMA and Falcon, as well as custom models built or trained by customers.

Speedometer Icon
Speedometer Icon

Ease of Use

  • Ready to use deployment templates for popular OSS models

  • Customize OSS models

  • Easily integrate with app dev and model dev workflows

  • Auto-selection of hardware

Graph Icon
Graph Icon


  • Fastest foundation models for generative AI made possible through our model acceleration technology

  • Accelerate and run your custom models

  • Flexibility to make price-performance tradeoffs

Globe Icon
Globe Icon

Make Accessible

  • Customers may select and run accelerated OSS foundation models, fine tune models, upgrade to new models as they emerge, or bring their own custom models

  • No lock-in into the model or service

OctoAI powered by AWS

Performance Impact

OctoML Model Acceleration on AWS

All Posts
Aug 30, 2023
5 minutes
Making the Llama 2 Herd Work for You on OctoAI

OpenAI deserves its flowers for creating the “iPhone opportunity” for the AI industry with ChatGPT. Thanks to the magnanimity of Meta, the incredible power of world-class LLMs will be accessible to everyone. Meta has catalyzed an infinite opportunity with its Android moment in releasing Llama 2. Sharing a high quality commercially viable open source Large Language Model (LLM) allows every company and developer to shine, not just one.

Blog Author - Jason Knight
Jason Knight
May 2, 2023
4 minutes
How to Run Stable Diffusion 3X Faster for 5X Less: Available for Early Access on OctoML Compute Service on AWS

At OctoML, we are on a mission to deliver affordable AI compute services for those who want control over the business they are building. That’s why we built a new compute service, available now in early access. It delivers AI infrastructure and advanced machine learning optimization techniques that you can only find in large scale AI services like OpenAI, but gives you the power to control your own API, choose your own models and  work within your AI budget. 

Blog Author - Andrew Luo
Andrew Luo
Blog Thumbnail 7
Apr 19, 2023
4 minutes
How 4X Speedup on Generative Video Model (FILM) Created Huge Cost Savings for WOMBO

Generative AI is the hottest workload on the planet, but it’s also the most compute intensive, and therefore expensive to run. This puts startups building generative AI businesses in a tricky position. Not only must they deliver killer product experiences that grab attention and market share – they need to make the economics work too. To lower compute costs, generative AI models need to run faster and more efficiently on a more diverse set of hardware.

Blog Author - Bassem Yacoube
Bassem Yacoube

Start building with ease in minutes using OctoAI

Our mission is empowering developers to build AI applications that delight users by leveraging fast models running on the most efficient hardware. Sign up and start building in minutes.

Sign Up