See latency and cost reductions across popular models

OctoML's CPU Inference Migration Analysis report title page

Dive into the results of the hardware portability

We’re excited to share with you the transformative results from OctoML customers using the platform to explore migration from Cascade Lake to AWS Graviton3 CPUs.

By moving AI/ML workloads from GCP with Intel to AWS with Graviton3, customers can:

  • Save 73% on compute costs

  • Gain up to 2.5x reductions in latency

  • Achieve those benefits in hours, not months, with OctoML's automation

Complete the form here to access the full report with high-fidelity charts as well as all the testing methods, data sources, context, and technical details of the analysis.

Performance meets portability with OctoML

OctoML's platform automates the complex dependencies between model and hardware, then optimizes the performance of your model. The result is a model with faster inference and lower throughput that you can port to any hardware target.

OctoML platform overview diagram illustrating the ML engine for ML optimization
Copyright © OctoML. All Rights Reserved.