Mission & Vision
At OctoML, we believe in the power of AI to improve lives. But to get there, AI must become more sustainable and accessible. Our goal is to empower more creators to harness the transformative power of ML to build intelligent applications.
OctoML makes AI more sustainable through efficient model execution and automation to scale services and reduce engineering burden. We make AI more accessible by enabling models to run on a broad set of devices and easier to deploy without specialized skills.

Our Story
OctoML was spun out of the University of Washington by the creators of Apache TVM, an open source stack for ML portability and performance. TVM enables ML models to run efficiently on any hardware backend, and has quickly become a key part of the architecture of popular consumer devices like Amazon Alexa.
Recognizing the potential for TVM and technologies like it to transform the full scope of the ML lifecycle, OctoML was born.

What's in a Name?
Thinking about the type of company we wanted to build, we took inspiration from the playful, clever, curious octopus. These unconventional thinkers have a unique, distributed intelligence that spans their entire body.
They are adaptive enough to camouflage at a moment’s notice, and creative enough to complete puzzles, build gardens, and use tools. Plus, like any good engineer, they love to take things apart.

Your data and intellectual property (IP) are paramount
The OctoML Compute Service is designed to address enterprise-grade data privacy and security needs. We continually invest in security capabilities and practices in our platform and processes. We recently received SOC2 Type 1 certification with Type 2 underway. Learn more about our measures to keep your information safe.
We’re also working towards a version of the service that can meet advanced residency and compliance requirements. If you have questions about using OctoML and meeting your specific compliance needs, let’s set up a time to talk.

We are searching for bright, talented, curious Octonauts to join us!
Our Investors




Read about our work

OctoML drives down production AI inference costs at Microsoft through new integration with ONNX Runtime ecosystem
Today, we’re excited to announce the results from our second phase of partnership with Microsoft to continue driving down AI inference compute costs by reducing inference latencies. Over the past year, OctoML engineers worked closely with Watch For to design and implement the TVM Execution Provider (EP) for ONNX Runtime - bringing the model optimization potential of Apache TVM to all ONNX Runtime users. This builds upon the collaboration we began in 2021, to bring the benefits of TVM’s code generation and flexible quantization support to production scale at Microsoft.

How 4X Speedup on Generative Video Model (FILM) Created Huge Cost Savings for WOMBO
Generative AI is the hottest workload on the planet, but it’s also the most compute intensive, and therefore expensive to run. This puts startups building generative AI businesses in a tricky position. Not only must they deliver killer product experiences that grab attention and market share – they need to make the economics work too. To lower compute costs, generative AI models need to run faster and more efficiently on a more diverse set of hardware.
Be the first to try OctoML’s compute service
Our mission at OctoML is to make AI sustainable and accessible so that developers are liberated to build the next generation of intelligent applications. We want you to join us on this journey by getting your hands on these capabilities first.