Machine Learning Accelerator
We maximize throughput and minimize latency by automatically tuning your model to the target hardware platform.

Increase model performance without sacrificing accuracy
OctoML automatically optimizes your model to your chosen hardware target with zero humans in the loop. No writing vendor kernels, calling external libraries, or re-architecting your model.

Compilation
Open compilation and search technology.

Quantization
Advanced, but entirely optional, quantization for accuracy-performance tradeoff configurations.

Optimization
Automated process that optimizes models inside your model development workflow.
Accelerate performance, decelerate cost
- Lower latency: Reduce user-perceived lag with a hyper-responsive online application.
- Higher throughput: Maximize hardware utilization with more predictions per second.
- Energy efficient: Save critical battery reserves for prediction at the edge on IoT devices.
- Lighter models: Run models efficiently on low-power, limited memory, edge devices with our lightweight runtime.
- Long-term savings: Faster predictions reduce inference costs, increasing savings over the lifetime of ML deployment cycle.

Using ML to make ML better
We use machine learning to explore all the possible optimizations available for your specific target hardware and pick the most performant ones.

Graph level optimizations
Rewrites dataflow graphs (nodes and edges) to simplify the graph and reduce device peak memory usage.

Operator level optimizations
Hardware target-specific low-level optimizations for individual operators/nodes in the graph.

Efficient runtime
TVM optimized models run in the lightweight TVM Runtime System, providing minimal API for loading and executing the model in Python, C++, Rust, Go, Java or Javascript.
ML based model optimization
We continuously collect data from similar models and hardware so our automated optimization pipeline delivers improved performance over time.
Model optimizations create a data store for future use.
As we implement these learnings, your model optimizations become faster and better.
The process is automated and self-learning.

Accelerate model performance while simplifying deployment
Built on open-source Apache TVM, OctoML elevates performance, enables continuous deployment, and works seamlessly with PyTorch, TensorFlow, ONNX serialized models, and more.


Boost performance without losing accuracy
Our proprietary process searches for the best program to automatically tune your model to the target hardware platform. Our customers have seen performance improvements of up to 30x.


Access comprehensive benchmarking
Compare against the original model, similar public models, various CPU and GPU instances types, and evaluate device sizing to deploy on ARM mobile or embedded processors.


Relax knowing we’re future-proofed
Our platform is built on open-source Apache TVM which is quickly becoming the defacto for machine learning compilers.


Experience broad interoperability
OctoML works seamlessly with TensorFlow, Pytorch, TensorFlow-lite or ONNX serialized models, plus offers easy on-boarding of new and emergent hardware.


Enjoy automated deployment
Easily deploy with a few command lines of code and skip the manual optimizations and save hours of experience engineering time on manual optimization and performance testing.