The standard logo for OctoML.
Contact SalesLogin
  • Blog
Contact SalesLogin

Machine Learning Accelerator

We maximize throughput and minimize latency by automatically tuning your model to the target hardware platform.

Contact SalesRequest model analysis

Increase model performance without sacrificing accuracy

OctoML automatically optimizes your model to your chosen hardware target with zero humans in the loop. No writing vendor kernels, calling external libraries, or re-architecting your model.

Icon of a magnifying glass over a page


Open compilation and search technology.



Advanced, but entirely optional, quantization for accuracy-performance tradeoff configurations.

Icon of an app model


Automated process that optimizes models inside your model development workflow.

Accelerate performance, decelerate cost

  • Lower latency: Reduce user-perceived lag with a hyper-responsive online application.
  • Higher throughput: Maximize hardware utilization with more predictions per second.
  • Energy efficient: Save critical battery reserves for prediction at the edge on IoT devices.
  • Lighter models: Run models efficiently on low-power, limited memory, edge devices with our lightweight runtime.
  • Long-term savings: Faster predictions reduce inference costs, increasing savings over the lifetime of ML deployment cycle.
framework hardware

Using ML to make ML better

We use machine learning to explore all the possible optimizations available for your specific target hardware and pick the most performant ones.

Terminal Icon

Graph level optimizations

Rewrites dataflow graphs (nodes and edges) to simplify the graph and reduce device peak memory usage.

Icon of a graph data structure

Operator level optimizations

Hardware target-specific low-level optimizations for individual operators/nodes in the graph.

Icon of a mobile phone

Efficient runtime

TVM optimized models run in the lightweight TVM Runtime System, providing minimal API for loading and executing the model in Python, C++, Rust, Go, Java or Javascript.

ML based model optimization

We continuously collect data from similar models and hardware so our automated optimization pipeline delivers improved performance over time.

  • Model optimizations create a data store for future use.

  • As we implement these learnings, your model optimizations become faster and better.

  • The process is automated and self-learning.

Accelerate model performance while simplifying deployment

Built on open-source Apache TVM, OctoML elevates performance, enables continuous deployment, and works seamlessly with PyTorch, TensorFlow, ONNX serialized models, and more.

Speedometer Icon
Speedometer Icon

Boost performance without losing accuracy

Our proprietary process searches for the best program to automatically tune your model to the target hardware platform. Our customers have seen performance improvements of up to 30x.

Graph Icon
Graph Icon

Access comprehensive benchmarking

Compare against the original model, similar public models, various CPU and GPU instances types, and evaluate device sizing to deploy on ARM mobile or embedded processors.

Relaxing wind icon
Relaxing wind icon

Relax knowing we’re future-proofed

Our platform is built on open-source Apache TVM which is quickly becoming the defacto for machine learning compilers.

Modelling Icon
Modelling Icon

Experience broad interoperability

OctoML works seamlessly with TensorFlow, Pytorch, TensorFlow-lite or ONNX serialized models, plus offers easy on-boarding of new and emergent hardware.

Globe Icon
Globe Icon

Enjoy automated deployment

Easily deploy with a few command lines of code and skip the manual optimizations and save hours of experience engineering time on manual optimization and performance testing.

Accelerate Your AI Innovation

Contact SalesLearn More