OctoML launches OctoAI, an AI compute service to run, tune, and scale generative AI models

New

Learn about the optimization techniques powering OctoStack's 10x performance boost

See how

Home

Blog

OctoML launches OctoAI, an AI compute service to run, tune, and scale your generative AI models

Jared Roesch

Jun 14, 2023

6 minutes

In this article

Introducing OctoAI, a self-optimizing compute service for your generative AI models

Application developers want foundation models to work for them

Early results at Extropolis and Civit.ai - delivering tuning at scale

Additional GPU options for AI model execution, through OctoAI’s model acceleration

On the roadmap

In this article

Introducing OctoAI, a self-optimizing compute service for your generative AI models

Application developers want foundation models to work for them

Early results at Extropolis and Civit.ai - delivering tuning at scale

Additional GPU options for AI model execution, through OctoAI’s model acceleration

On the roadmap

Generative AI has transformed the technology landscape. Developers are moving beyond early experimentation, and building generative AI powered applications that directly serve customer and business needs. We see this across many use cases - from creative content generation, to self-service chatbots, to summaries of complex real world information.

As a technology innovator in this space, OctoML is already working with many early movers on this journey. What we hear are: (i) developers need tools to easily, cost-effectively, and reliably run and tune/customize models in production; and (ii) they want to focus on building applications, not on new models or model operations. Traditional AI (“AI 1.0”) tools are no longer sufficient. Developers and organizations need the models to reliably "just work for them - like other infrastructure", as they build new experiences and workflows for customers. OctoAI is designed and built to address just this; to make models work for organizations, not the other way around.

Introducing OctoAI, a self-optimizing compute service for your generative AI models

OctoAI is a compute service to run, tune (or customize), and scale your generative AI models. The service builds on the expertise and technologies around AI/ML systems optimization at OctoML and abstracts away details of model execution and hardware optimization from developers. OctoAI also delivers a library of the world’s fastest and most affordable generative AI models - enabled by the platform’s model acceleration capability. Open-source software (OSS) foundation model templates available at launch include Stable Diffusion 2.1, Dolly v2, Llama 65B, Whisper, FlanUL, and Vicuna.

OctoAI lets your team focus on your unique features and innovations, and the experiences that matter to your customers. OctoAI delivers to developers:

Ease-of-use. Choose from a library of ready-to-use templates for popular open-source models to simplify deployment. Select and customize (fine-tune) models to meet specific requirements. Easily integrate with app and model development workflows.
Efficiency. Run, tune and scale off-the-shelf, OSS and custom models. Automated hardware selection and model acceleration let you decide on price-performance tradeoffs.
Freedom. Upgrade to new models as they emerge. Bring your own custom models. No lock-in into the model or service.

Put simply, OctoAI offers the simplicity and reliability of inflexible “blackbox” alternatives, while giving you the freedom and flexibility to adopt the latest innovations in generative AI happening in the OSS community - like LLaMA 65B, Stable Diffusion 2.1, and our recently announced commercializable OSS multi-modal model InkyMM.

Application developers want foundation models to work for them

One of the early customers that we worked with on scaling generative AI applications is Extropolis AI, a company on a mission to “make generative AI more accessible to everyone”. The team built Diffusitron, one of the highest rated mobile apps on iOS for generative art focused on delivering delightful user experiences and advanced capabilities to unlock new levels of creativity for their community.

Since launching in November last year, they have experienced explosive growth in adoption, scaling from hundreds of images to almost a hundred thousand daily. As a result, they have doubled down on product innovation with the goal to create the most advanced workflows in the market based on fine tuned open-source Stable Diffusion technology, while still balancing SLAs, reliability and cost.

We hear this story echoed across application developers building on OSS foundation models - including multimedia applications on FILM and Whisper; internal productivity solutions built on LLMs integrated with enterprise search; and customer service automation built on customized LLM powered experiences using LangChain. Developers need a reliable and flexible way to run, tune and scale these foundation models, as they innovate on their applications. OctoAI delivers this, and feedback from early customers has been resoundingly positive.

Early results at Extropolis and Civit.ai - delivering tuning at scale

The Extropolis AI team has since been working with us to evaluate OctoAI, and early testing has shown improvements in performance, up to 4-5x reduction in latency, and increased reliability and uptime as they deal with traffic spikes without sacrificing image quality results.

“Our goal is to unlock new levels of creativity for millions of users by making generative art technology more accessible to everyone. It must run fast to deliver a compelling user experience yet also be affordable. OctoAI is the perfect platform to achieve our goal! The ability to fine tune models at scale, customize the user experience and keep it affordable for all to access makes the service very attractive to us. We are happy to announce that we will be moving our Stable Diffusion service into OctoAI.” said Kalin Ovtcharov, CEO of Extropolis AI.

The ability to fine tune models at scale, customize the user experience and keep it affordable for all to access makes the service very attractive to us.

Kalin Ovtcharov, CEO of Extropolis AI

Another customer that the OctoAI team has been working closely with over the past months is Civitai.com, a content sharing service with a strong and engaged community of users. On Civitai.com, you can find over 500 checkpoints for Stable Diffusion models.

Civitai is introducing a new service for its community to immediately generate images for any checkpoint they find on Civitai. The team wants to launch and scale quickly, while staying cost-effective on the backend as they expect rapid user adoption of their new service. Pilot deployments and tests with OctoAI have been exceedingly positive, and the Civitai team is planning their service launch with OctoAI in the near future.

“I want us to be completely focused on our community and the experience we deliver, and not spend time on Machine Learning Ops. In engagements so far, OctoAI has enabled us to do exactly this. We were able to integrate OctoAI’s optimized Stable Diffusion endpoint very easily, and thus build our new service quickly. OctoAI’s expertise with running Stable Diffusion in a low-latency and scalable way has been outstanding, and we’re excited about partnering with them as we launch our new service.” said Justin Maier, Founder and CEO of Civitai.

Additional GPU options for AI model execution, through OctoAI’s model acceleration

Core to how OctoAI delivers speed and cost savings is model acceleration, which increases speed and lowers latencies for models on a given hardware target. This has been of particular interest to teams launching or scaling applications, but unable to get capacity on the powerful NVIDIA A100 GPUs. Model acceleration unlocks easily available NVIDIA A10G GPU's for many of these use cases, delivering the required model performance in a cost effective and scalable manner. And today, we’re already working with organizations to move models and traffic from A100s to A10G GPU backends.

On the roadmap

As of launch, OctoAI includes ready to use deployment templates for several popular foundation models including Stable Diffusion 2.1, Dolly v2, Llama 65B, Whisper, FlanUL, ControlNet and Vicuna. We are also working to accelerate additional foundation models in the coming days, including the recently released Falcon LLM. If there are specific OSS foundation models you would like to see here, reach out to us on Discord. Other areas of focus for the team include fine tuning of accelerated models from within OctoAI, and automatic acceleration of your custom models. We’ve also had several customers express interest in added flexibility to deployment choices, including running the service within their own environment or VPC. These are capabilities actively under development, and we will share more about this in the coming months!

“AI is no longer a novelty, it’s real business. But efficient compute is critical to making it viable,” said Luis Ceze, CEO, OctoML. “Every company is scrambling to build AI-powered solutions, yet the process of model from development to production is incredibly complex and often requires costly, specialized talent and infrastructure. OctoAI makes models work for business, not the other way around. We abstract away all the complexity so developers can focus on building great applications, instead of worrying about managing infrastructure.”

OctoAI is now available to everyone in public beta. Sign up and try OctoAI today. Get started easily with our sample code and tutorials on github.