New AI/ML Tool for Fans of Docker Compose and Kubernetes–OctoML CLI

Jared Roesch

Jun 21, 2022

Public Release of OctoML CLI (v.4)

We’re excited to share with you our first public release of OctoML CLI (v0.4.4) which provides an ML deployment workflow which should feel very familiar to anyone who uses Docker, Docker Compose and Kubernetes. The tool was shaped by the extensive conversations we had with our users and friends about how to make AI/ML more accessible to a broader range of developers and operators.

What they shared with us were two major blockers. The first is one we can all appreciate; they just didn’t have the stomach for yet another learning curve despite being incredibly eager to add AI/ML to their toolbox. The second is literally access to AI/ML models packaged and shared in an agile, portable, production-ready form factor. And access to having ownership of their own APIs which isn’t in the offing when you rely on blackbox API services like OpenAI. The users we know want to grab hold of the tech, hack on it in their environment, weave it into their end-to-end workflows and see how it deploys in their world.

We built OctoML CLI to knock down these blockers. OctoML CLI gets AI/ML models to behave just like the rest of your application stack, so you can start tinkering with ML powered features to level-up your application. OctoML CLI converts trained model artifacts into an intelligent function with a stable API that allows you to develop against it whether in your development or operations environment. We are also excited to share that we’ve collaborated with our friends at NVIDIA to make this possible with their awesome open-source inference server; Triton.

Additionally, we’ve built an open-source sample app called Transparent AI (leveraging a style transfer model) to provide an end-to-end workflow demonstrating how to go from a trained model all the way to a production service deployed at scale. In other words, you can develop with it locally, deploy it to the cloud, deliver optimal performance, and observe it operating in production. If you want to take a quick sneak peak of the output of this workflow you can see it here at this TransparentAI demo site.

OctoML's Transparent AI Sample App

OctoML's Transparent AI Demo

And if you like what you see, you can do it yourself with our Quick Start tutorial which will get you up and running in 5 minutes.

OctoML CLI for Devs

Developers can watch this demo to get a feel for how you would use the tool. We have also got the steps documented here. In five minutes, you will docker-compose up to deploy locally and then run it in the cloud in Kubernetes using Helm.

Show us how you did and we’ll send you a free, special edition T-shirt! Share one of your stylized images using the hashtag #TameTheTensorFromHell and we’ll send you a link to pick out your size and style.

OctoML's Transparent AI example #TameTheTensorFromHell

OctoML's Transparent AI example #TameTheTensorFromHell

OctoML CLI for Ops

We also wanted to ensure we covered the full end-to-end production deployment scenario as many operators are challenged to scale their AI/ML operations in the same way they have done with traditional applications using DevOps methodologies.

Here is the TransparentAI demo for Ops. Here we cover how to address making a model more reliable in production, accelerate it, make it hardware independent so we can choose the right hardware for the service, and then operationally scale it using our existing Kubernetes environment.

For you fellow operators out there, ML deployment operations cannot scale like all other software workloads due the complexity we affectionately call the Tensor from Hell. The Tensor from Hell is a metaphor for the rigid set of dependencies between the ML training framework (e.g. Pytorch), the ML model/model type and the hardware it needs to run on at various stages of its lifecycle. To tame the Tensor from Hell requires a platform that automatically: produces customized code for the specific HW parameters, selects the right libraries, compiler options; and guides configuration settings to deliver peak performance and meet any other SLA requirements for the hardware employed at every stage of the DevOps lifecycle.

That platform has to have insights across a comprehensive fleet of 80+ deployment targets – in the cloud (AWS, Azure and GCP) and at the edge with accelerated computing including GPU, CPU, NPU from NVIDIA, Intel, AMD, ARM and AWS Graviton – used for automated compatibility testing, performance analysis and optimizations on actual hardware. And the platform has to have an expansive software catalog covering all major ML frameworks, acceleration engines, like TVM, software stacks from the chip providers, and all other software dependencies required for deployment anywhere.

Performance and compatibility insights must be backed by real-world scenarios (not simulated) to accurately inform deployment decisions and ensure SLAs around performance, cost and user experience are met.

OctoML's SaaS Platform

OctoML's SaaS Platform

The OctoML CLI can kick-off the process I described above in the OctoML SaaS. To get access to the OctoML SaaS you will need to sign up here.

Where We Would Love Your Help/What’s Next

As I’ve shared, this is our first public release and we are opening up our tool beyond our initial set of friendlies for your feedback and experimentation. Today’s capabilities represent an initial set of features which we will be expanding starting with:

  • Support for multiple models and multi-model based workflows.
  • Improved image sizes and image based packaging related improvements and optimizations.
  • Support for raw model ingestion, and better integration into model creation tools.
  • Improved deployment experience and resources.

And we look forward to your input on what else to add to our roadmap for the CLI and for our tutorials.

Accelerate Your AI Innovation