We’re excited to share with you our first public release of OctoML CLI (v0.4.4) which provides an ML deployment workflow which should feel very familiar to anyone who uses Docker, Docker Compose and Kubernetes. The tool was shaped by the extensive conversations we had with our users and friends about how to make AI/ML more accessible to a broader range of developers and operators.
What they shared with us were two major blockers. The first is one we can all appreciate; they just didn’t have the stomach for yet another learning curve despite being incredibly eager to add AI/ML to their toolbox. The second is literally access to AI/ML models packaged and shared in an agile, portable, production-ready form factor. And access to having ownership of their own APIs which isn’t in the offing when you rely on blackbox API services like OpenAI. The users we know want to grab hold of the tech, hack on it in their environment, weave it into their end-to-end workflows and see how it deploys in their world.
We built OctoML CLI to knock down these blockers. OctoML CLI gets AI/ML models to behave just like the rest of your application stack, so you can start tinkering with ML powered features to level-up your application. OctoML CLI converts trained model artifacts into an intelligent function with a stable API that allows you to develop against it whether in your development or operations environment. We are also excited to share that we’ve collaborated with our friends at NVIDIA to make this possible with their awesome open-source inference server; Triton
Additionally, we’ve built an open-source sample app called Transparent AI leveraging a style transfer model to provide an end-to-end workflow demonstrating how to go from a trained model all the way to a production service deployed at scale. In other words, you can develop with it locally, deploy it to the cloud, deliver optimal performance, and observe it operating in production. If you want to take a quick sneak peak of the output of this workflow you can see it here at this TransparentAI demo site
OctoML's Transparent AI Demo
And if you like what you see, you can do it yourself with our Quick Start tutorial which will get you up and running in 5 minutes.
OctoML CLI for Devs
Show us how you did and we’ll send you a free, special edition T-shirt! Share one of your stylized images using the hashtag #TameTheTensorFromHell and we’ll send you a link to pick out your size and style.
OctoML's Transparent AI example #TameTheTensorFromHell
OctoML CLI for Ops
We also wanted to ensure we covered the full end-to-end production deployment scenario as many operators are challenged to scale their AI/ML operations in the same way they have done with traditional applications using DevOps methodologies.
For you fellow operators out there, ML deployment operations cannot scale like all other software workloads due the complexity we affectionately call the Tensor from Hell. The Tensor from Hell is a metaphor for the rigid set of dependencies between the ML training framework (e.g. Pytorch), the ML model/model type and the hardware it needs to run on at various stages of its lifecycle. To tame the Tensor from Hell requires a platform that automatically: produces customized code for the specific HW parameters, selects the right libraries, compiler options; and guides configuration settings to deliver peak performance and meet any other SLA requirements for the hardware employed at every stage of the DevOps lifecycle.
That platform has to have insights across a comprehensive fleet of 80+ deployment targets – in the cloud (AWS, Azure and GCP) and at the edge with accelerated computing including GPU, CPU, NPU from NVIDIA, Intel, AMD, ARM and AWS Graviton – used for automated compatibility testing, performance analysis and optimizations on actual hardware. And the platform has to have an expansive software catalog covering all major ML frameworks, acceleration engines, like TVM, software stacks from the chip providers, and all other software dependencies required for deployment anywhere.
Performance and compatibility insights must be backed by real-world scenarios (not simulated) to accurately inform deployment decisions and ensure SLAs around performance, cost and user experience are met.
OctoML's SaaS Platform
The OctoML CLI can kick-off the process I described above in the OctoML SaaS. To get access to the OctoML SaaS you will need to sign up here
Where We Would Love Your Help/What’s Next
As I’ve shared, this is our first public release and we are opening up our tool beyond our initial set of friendlies for your feedback and experimentation. Today’s capabilities represent an initial set of features which we will be expanding starting with:
Support for multiple models and multi-model based workflows.
Improved image sizes and image based packaging related improvements and optimizations.
Support for raw model ingestion, and better integration into model creation tools.
Improved deployment experience and resources.
Fast-track to deploying machine learning models with OctoML CLI and NVIDIA Triton Inference Server
Today, we introduce the OctoML CLI, a Command Line Interface that automates the deploying deep learning models - model containerization and acceleration. One of the key technologies that ties our containerization and acceleration together is NVIDIA Triton Inference Server.
10 tips for OctoML CLI power users to fast-track your model deployments
The OctoML CLI is a sophisticated and powerful tool to fast-track your machine learning deployments. We wanted to share our favorite tips to get the most out of your accelerated ML containers.