This capability rapidly progresses users through the initial trial step that most of them currently take, which is to explore how successful OctoML is in terms of speeding up popular models. This is also valuable to users because it provides actionable data to help them decide which model architectures and hardware targets are suitable for their product and business needs. Users will be able to download an optimized package of the model, including the necessary runtime, and run the model in the cloud of their choice.
The model zoo provides a very significant proofpoint for customers, with an average of 2.2x speedups, because many commonly benchmarked models, like ResNet and BERT, are already heavily hand-tuned by the leading hardware providers. In practice, OctoML's automated acceleration techniques are even more powerful for proprietary customer models – the pre-accelerated benchmarks available in the platform often serve as the floor for performance improvements users can expect. For instance, using the same model acceleration techniques, OctoML was able to help achieve major speedups for Microsoft’s Watch For team, which runs a massive bulk video analysis model that processes millions of hours of video and billions of images a month. OctoML was able to speed up those heavily deployed models 1.3-3X without changing its accuracy! Microsoft is putting those models into production now to drive significant inference cost savings.