Authors: Luis Ceze, Thierry Moreau
We started OctoML two years ago because our research showed a deep problem hindering AI innovation. ML developers are struggling to get trained models deployed to production across all the various hardware endpoints out there today, resulting in skyrocketing costs and significant delays in time to production. Why? There’s been an explosion in specialized hardware and disparate cloud services—each with its own software stack and set of specifications. Unfortunately, for enterprises to take advantage of this explosion, they must face the commensurate complexity of building up expertise on each platform. Not to mention they’ll have to manage deployments manually with efforts like hand-tuning models for optimized performance—for a single model.
That’s why we set out to build an ML Deployment Platform that empowers enterprises to create a unified deployment lifecycle across all their ML hardware vendors. Along with it, we began establishing relationships with the top hardware vendors to offer support for every possible endpoint.
Today, we’re excited to announce a new milestone in the evolution of our platform and ecosystem. We’re officially announcing our first partnership with Qualcomm, an industry leader in mobile hardware technology.
In working on this partnership, we were delighted to see that we shared a common vision of what we could achieve for our customers. By collaborating around Qualcomm’s Snapdragon Platforms and SOCs, we believe that we can deliver transformative AI/ML performance on power-constrained devices in a way that can unlock new innovations at the edge.
We are proud to share that this partnership with Qualcomm is happening on two fronts: collaboration in the Apache TVM open source community AND around our OctoML (commercial) Platform. The first aspect is a strategic agreement to work closely with the Qualcomm® Tensor Virtual Machine (TVM) compiler team to extend Apache TVM for Qualcomm® Hexagon™ processors. This means that both companies are committed to pushing the envelope on performance at the edge—where a 2X improvement in performance can actualize a new model that couldn’t previously run on power-constrained devices, or can reduce the cost of the hardware required to run models by half. The collaboration happening in the open in the Apache TVM community ensures this level of innovation is accessible to a broad array of data scientists and ML engineers looking to drive new applications and services directly on mobile devices, and have them run in the hands of end-users.
The second facet of the partnership is that Qualcomm and OctoML will make these enhancements available to customers through the OctoML Machine Learning Deployment Platform. The platform directly addresses the myriad of challenges enterprises have with their existing ML deployment pipelines, which only get more complex for edge deployments. The OctoML platform automates performance optimization, comparative benchmarking and packaging to reduce both the time to deployment and the cost of deployment. It does so while also assuring the customer they have the optimal performance with accuracy to deploy winning models at the edge.
With the specifics of the partnership news laid out, we also wanted to also zoom out and provide you with our broader perspective on the ecosystem we are building here at OctoML and what is motivating that. First and foremost our ecosystem efforts are driven by our vision for the company which is to make ML accessible to as many creators as possible to drive an incredible diversity of innovation. And to do so in a way that the benefits of AI/ML are accessible to everyone, everywhere.
That vision is what motivated the creation of the Apache TVM project to ensure that there is an accessible programmable software layer that makes ML portable across the widest array of hardware possible. It is very humbling for us to now have leaders like Qualcomm formally committing to making this open source project a key aspect of their next generation SOCs, and it is a huge win for the community and the market overall.
Whereas the Apache TVM ecosystem brings the hardware community together for collaboration, the OctoML platform provides our commercial customers with the diversity of choice they need when building a comprehensive AI/ML strategy for their company. They want to make sure that one platform can be used as a common deployment framework from on-prem, to cloud to edge to mobile. And having key hardware partners officially part of the picture is a key validation point.
What you can expect from us over the next couple of months is a steady stream of these ecosystem partnership announcements from the dominant hardware providers in the industry. It is important for us to show our community that the growth in collaboration with Apache TVM is significant and that these partnerships are also adding to the choices that enterprises will have as they seek to scale their ML deployment operations.
If you're ready to get started speeding up your machine learning deployments on Qualcomm hardware, sign up now for early access to OctoML.
What the $28 million Series B means for OctoML and for accelerating ML deployment
We are thrilled to announce that OctoML has closed a $28M million Series B funding round led by Addition and Lee Fixel with participation from existing investors Madrona Venture Group and Amplify Partners.
With Apache TVM, Microsoft Research develops and serves the latest computer vision algorithms on live streams
OctoML engineering collaborated with Microsoft Research on the “Watch For” project, an AI system for analyzing live video streams and identifying specified events within the streams.