Running the industry’s most cost effective LLaMA 65B on OctoAI

At OctoML, our research team has been working hard to improve the cost of operating large open source foundation models like the LLaMA 65B. Today we're sharing some exciting progress: our accelerated LLaMA 65B on the OctoAI* compute service is nearly 1/5 the cost of running standard LLaMA 65B on Hugging Face Accelerate while being 37% faster despite using less hardware.

Jason Knight
Jun 7, 2023

Introducing 🐙InkyMM the First Open Source, Commercializable Multi-Modal Model

Today, OctoML is announcing the first open-source, fully commercializable Image + Text LLM, built upon the great work of researchers at King Adbullah University and the MPT-7B Instruct published by MosaicML.

Ben Hamm
May 25, 2023

How to Run Stable Diffusion 3X Faster for 5X Less: Available for Early Access on OctoML Compute Service on AWS

At OctoML, we are on a mission to deliver affordable AI compute services for those who want control over the business they are building. That’s why we built a new compute service, available now in early access. It delivers AI infrastructure and advanced machine learning optimization techniques that you can only find in large scale AI services like OpenAI, but gives you the power to control your own API, choose your own models and  work within your AI budget. 

Andrew Luo
May 2, 2023

