Increase model performance without sacrificing accuracy
OctoML automatically optimizes your model to your chosen hardware target with zero humans in the loop. No writing vendor kernels, calling external libraries, or re-architecting your model.
Open compilation and search technology.
Advanced, but entirely optional, quantization for accuracy-performance tradeoff configurations.
Automated process that optimizes models inside your model development workflow.
Accelerate performance, decelerate cost
- Lower latency: Reduce user-perceived lag with a hyper-responsive online application.
- Higher throughput: Maximize hardware utilization with more predictions per second.
- Energy efficient: Save critical battery reserves for prediction at the edge on IoT devices.
- Lighter models: Run models efficiently on low-power, limited memory, edge devices with our lightweight runtime.
- Long-term savings: Faster predictions reduce inference costs, increasing savings over the lifetime of ML deployment cycle.
Using ML to make ML better
We use machine learning to explore all the possible optimizations available for your specific target hardware and pick the most performant ones.
Graph level optimizations
Rewrites dataflow graphs (nodes and edges) to simplify the graph and reduce device peak memory usage.
Operator level optimizations
Hardware target-specific low-level optimizations for individual operators/nodes in the graph.
ML based model optimization
We continuously collect data from similar models and hardware so our automated optimization pipeline delivers improved performance over time.
- Model optimizations create a data store for future use
- As we implement these learnings, your model optimizations become faster and better
- The process is automated and self-learning