The standard logo for OctoML.
Contact SalesLogin
  • Blog
Contact SalesLogin


Explore the latest OctoML product updates, TVM updates, and overall machine learning news.

OctoML drives down production AI inference costs at Microsoft through new integration with ONNX Runtime ecosystem

Today, we’re excited to announce the results from our second phase of partnership with Microsoft to continue driving down AI inference compute costs by reducing inference latencies. Over the past year, OctoML engineers worked closely with Watch For to design and implement the TVM Execution Provider (EP) for ONNX Runtime - bringing the model optimization potential of Apache TVM to all ONNX Runtime users. This builds upon the collaboration we began in 2021, to bring the benefits of TVM’s code generation and flexible quantization support to production scale at Microsoft.

Blog Author - Sameer Farooqui
Blog Author - Matthai Philipose
Blog Author - Loc Huynh
Sameer Farooqui & Matthai Philipose & Loc Huynh
Mar 2, 2023

DIY Gen AI: Evaluating GPT-J Deployment Options

Model optimizations can save you millions of dollars over the life of your application. Additional savings can be realized by finding the lowest cost hardware to run AI/ML workloads. Companies building AI-powered apps will need to do both if they want a fighting chance at building a sustainable business.

Blog Author - Luis Ceze
Luis Ceze
Feb 15, 2023
Thumbnail 6

PyTorch 2.0 + Apache TVM: Better Together

OctoML is investing in a PyTorch 2.0 + Apache TVM integration because we believe that a PyTorch 2.0 integration will provide users with a low-code approach for utilizing Apache TVM.

Blog Author - Denise  Kutnick
Denise Kutnick
Dec 12, 2022

All Posts

Accelerate Your AI Innovation

Contact SalesLearn More