Tristan Konolige

SWE

1 Article

Jared Roesch

T

Dec 16, 2021

Jared Roesch

T

Dec 16, 2021

Write Python with blazing fast CUDA-level performance

By using TVMScript, TVM's embedded domain specific language (DSL), OctoML engineers are able to demonstrate a 20x speedup over a straightforward PyTorch implementation on CPU, and a 1.3x speedup over handwritten CUDA implementation on GPU for a real-world kernel.

1

Accelerate Performance and Deployment Time