OctoML and AWS Better Together

OctoAI is a compute service to run, tune, and scale your generative AI models, built on top of AWS. It allows developers to quickly and cost effectively take generative AI applications to production on AWS.

AWS Partner Network logo

Benefits of OctoAI, powered by AWS

OctoAI complements the AWS core infrastructure offerings to ensure models are run in a hardware configuration that is optimized for the model and for the application. OctoAI's model acceleration reduces latency and cost for popular foundation models including Stable Diffusion, Whisper, LLaMA and Falcon, as well as custom models built or trained by customers.

Speedometer Icon
Speedometer Icon

Ease of Use

  • Ready to use deployment templates for popular OSS models

  • Customize OSS models

  • Easily integrate with app dev and model dev workflows

  • Auto-selection of hardware

Graph Icon
Graph Icon


  • Fastest foundation models for generative AI made possible through our model acceleration technology

  • Accelerate and run your custom models

  • Flexibility to make price-performance tradeoffs

Globe Icon
Globe Icon

Make Accessible

  • Customers may select and run accelerated OSS foundation models, fine tune models, upgrade to new models as they emerge, or bring their own custom models

  • No lock-in into the model or service

OctoAI powered by AWS

Performance Impact

OctoML Model Acceleration on AWS

