Featured

Flexible Systems for the Next ML Revolution(s)

The environment for machine learning innovation has never been better. Modern GPUs are marvels of supercomputer engineering. And software stacks have raised the level of abstraction for ML implementations...

Adrian Sampson

Adrian Sampson

Jan 6, 2022

Collage: Automated integration of various deep learning backends results in state of the art model performance

At TVMCon this week, we will be presenting our latest research from Carnegie Mellon University and University of Michigan for generating the fastest possible executable for a given machine learning model by using Collage.

Byungsoo Jeon
Sunghyun Park

Dec 15, 2021

All Posts

Chris Hoge

Chris Hoge

Jan 13, 2022

Chris Hoge

Chris Hoge

Jan 13, 2022

TVMCon 2021 Wrapup

The Apache TVM Community and OctoML closed out 2021 with the fourth annual Apache TVM and Open Source ML Acceleration Conference. It was the TVM community’s largest event ever, with 700 attendees from 34 nations coming together for a virtual conference...

Adrian Sampson

Adrian Sampson

Jan 11, 2022

Adrian Sampson

Adrian Sampson

Jan 11, 2022

The Future of Hardware is Software

General-purpose GPU computing helped launch the deep learning era. As ML models have grown larger and more computationally intense, however, they have changed the way GPUs are designed—and they have inspired a wave of new hardware that looks radically different from GPUs.

Adrian Sampson

Adrian Sampson

Jan 6, 2022

Adrian Sampson

Adrian Sampson

Jan 6, 2022

Flexible Systems for the Next ML Revolution(s)

The environment for machine learning innovation has never been better. Modern GPUs are marvels of supercomputer engineering. And software stacks have raised the level of abstraction for ML implementations...

Jared Roesch

T

Dec 16, 2021

Jared Roesch

T

Dec 16, 2021

Write Python with blazing fast CUDA-level performance

By using TVMScript, TVM's embedded domain specific language (DSL), OctoML engineers are able to demonstrate a 20x speedup over a straightforward PyTorch implementation on CPU, and a 1.3x speedup over handwritten CUDA implementation on GPU for a real-world kernel.

Jason Knight

2

3 authors

Dec 16, 2021

Jason Knight

2

3 authors

Dec 16, 2021

Free pre-accelerated Model Zoo streamlines choice of model and cloud/edge targets

This week at TVMCon, OctoML is launching a model zoo with pre-accelerated, ready-to-download vision and language models. Running extremely fast, sub-millisecond models in production is now easier than ever, whether in the cloud or at the edge.

1

...

Accelerate Performance and Deployment Time