Using Apache TVM, OctoML generates a hardware-specific optimized model for CPUs, GPUs, and accelerators. The result is performance comparable to state-of-the-art hand tuned libraries with no loss in accuracy.
Do you need to invest in faster (but expensive) hardware? We benchmark your model on diverse hardware targets to help you decide.
Run it everywhere
We simplify the hardest parts of ML deployment