We are thrilled to announce that OctoML has closed a $28M million Series B funding round led by Addition and Lee Fixel with participation from existing investors Madrona Venture Group and Amplify Partners.
90% of machine learning models don’t make it to production
While there has been significant progress in techniques for data management and machine learning model creation, there is still a significant gap between building a model and making that model production-ready.
Between rapidly evolving ML models, ML frameworks and a Cambrian explosion of hardware backends, it becomes incredibly difficult to ensure that your model deployment works as expected. It is not easy to make sure your model runs fast enough and to benchmark it across different deployment hardware. Even if your determined machine learning team has hurtled through this gauntlet they still have to go through a whole different set of challenges to package and deploying at the edge.
The 10% that do deploy take months of manual work
Why months? First, your machine learning team has to do a lot of manual optimizations required to improve performance while maintaining accuracy. By not addressing performance problems efficiently, you rake up high cloud bills with your CPU/GPU usage or you won’t be able to deploy on resource-constrained edge devices. Then there is the labor-intensive model packaging for devices and platforms. Plus there are no modern CI/CD integrations to keep up with model changes.
What good is an ML model if it isn’t fast? doesn’t scale? isn’t accurate enough? takes weeks to deploy? and costs too much?
Making machine learning fast, useful, and accessible
Our founding team started the open-source machine learning compiler framework Apache TVM and it has quickly become the go-to solution for developers and ML engineers to maximize ML model performance on any hardware backend. With OctoML we are establishing the first Machine Learning Acceleration Platform that will automatically maximize model performance while enabling seamless deployment on any hardware, cloud provider, or edge device.
Why Machine Learning Acceleration?
We aim to enable you to extract the best performance out of your machine learning models and automate the entire process of deploying the model to production: from model optimization, benchmarking, to packaging for deployment. By automating this entire process we accelerate your time-to-market while also significantly reducing your compute costs and enabling edge use cases. The performance optimization magic comes from applying machine learning to machine learning model optimization and compilation.
We are automating the entire process of optimizing, benchmarking, and deploying your model for you.
Select hardware targets in OctoML UI
OctoML Benchmarks UI
In the last year, here are a few strides our team of 45 has made:
- Held our third TVM conference that was attended by nearly a thousand people from 500 organizations including Facebook, Microsoft, Qualcomm, Arm and Google
- We have a growing waitlist for our flagship SaaS product — Octomizer. (Sign up for early access.)
- Ran pilots with many leading companies using AI/ML such as Microsoft and Sony, and demonstrated significant improvements to their model performance
- Optimized and compiled HuggingFace’s BERT-base model, a common NLP model used widely across the machine learning ecosystem on an M1 Mac Mini, and were able to improve GPU performance by 49% and CPU performance by 22%, and demonstrated leading performance on sparse models
- Partnered with key hardware vendors like Qualcomm and AMD and enabled their hardware on Apache TVM
This new funding will help us tackle a variety of strategic priorities:
- We have a world class team (in machine learning systems, cloud services, AI/ML products, ….) and we will be doubling our team across engineering, sales, customer success, and marketing
- Continuing to build and scale a successful remote working culture
- Launching Octomizer, our self-serve SaaS product
- Investing in sales and marketing to build a globally recognizable brand
We are incredibly grateful for the steadfast support of the Apache TVM community, the OctoML team, our early users, and investors. You have all made us who we are and we will strive to realize the promise of machine learning by making models fast, accessible, and useful to all.