Performance

Get a 30x boost in performance

We maximize throughput and minimize latency by automatically tuning your model to the target hardware platform.

Get a 30x boost in performance
Get a 30x boost in performance

Increase model performance without sacrificing accuracy

OctoML automatically optimizes your model to your chosen hardware target with zero humans in the loop. No writing vendor kernels, calling external libraries, or re-architecting your model.

Compilation
Compilation

Open compilation and search technology.

Quantization
Quantization

Advanced, but entirely optional, quantization for accuracy-performance tradeoff configurations.

Optimization
Optimization

Automated process that optimizes models inside your model development workflow.

Accelerate performance, decelerate cost

  • Lower latency: Reduce user-perceived lag with a hyper-responsive online application.
  • Higher throughput: Maximize hardware utilization with more predictions per second.
  • Energy efficient: Save critical battery reserves for prediction at the edge on IoT devices.
  • Lighter models: Run models efficiently on low-power, limited memory, edge devices with our lightweight runtime.
  • Long-term savings: Faster predictions reduce inference costs, increasing savings over the lifetime of ML deployment cycle.
Accelerate performance, decelerate cost

Using ML to make ML better

We use machine learning to explore all the possible optimizations available for your specific target hardware and pick the most performant ones.

Graph level optimizations
Graph level optimizations

Rewrites dataflow graphs (nodes and edges) to simplify the graph and reduce device peak memory usage.

Operator level optimizations
Operator level optimizations

Hardware target-specific low-level optimizations for individual operators/nodes in the graph.

Efficient runtime
Efficient runtime

TVM optimized models run in the lightweight TVM Runtime System, providing minimal API for loading and executing the model in Python, C++, Rust, Go, Java or Javascript.

ML based model optimization

We continuously collect data from similar models and hardware so our automated optimization pipeline delivers improved performance over time.

  • Model optimizations create a data store for future use
  • As we implement these learnings, your model optimizations become faster and better
  • The process is automated and self-learning
ML based model optimization

Maximize performance. Simplify deployment.

Ready to get started?