Sign up
Log in
Sign up
Log in
New
Learn about the optimization techniques powering OctoStack's 10x performance boost
See how

Pricing & Plans

Get started today on OctoAI and receive $10 of free credit in your account.

OverviewText Gen SolutionMedia Gen Solution

Text Gen Solution

The $10 credit is the equivalent of over 500,000 words with the largest Llama 2 70B model, and over a million words with the new Mixtral 8x7B model.

OctoAI’s unified API endpoint to build on your choice of open source LLMs and variants.

New
Model Remix Credits

We're giving away up to 150x bonus credits for our brand new Text Gen Solution on top of our industry-leading cost-per-token. Requires certain spend or commit to spend.

See detailed pricing
Features
Free Trial

$10

Free credit upon sign up

Get started building your project

GTE Large
Bring your Fine Tune
Fine-tuning
Bring your choice of checkpoints
Committed use discounts
Performance optimization options
Contractual SLAs
Dedicated Customer Success Manager
Option for private deployment
Sign Up
See detailed pricing
Features
Free Trial

$10

Free credit upon sign up

Get started building your project

Pro

$0.15

Per 1M tokens for 7B and 8B models

$1.20

Per 1M tokens for 8x22B models

Enterprise

Contact Us

Bring your own checkpoint

GTE Large
Bring your Fine Tune
Fine-tuning
Bring your choice of checkpoints
Committed use discounts
Performance optimization options
Contractual SLAs
Dedicated Customer Success Manager
Option for private deployment
Sign Up
Sign up
Contact us

Frequently asked questions

Don’t see the answer to your question here? Feel free to reach out so we can help.

What are your rate limits for the Text Gen Solution?

The rate limits are as follows:

  • Free Tier = 10 RPM

  • Pro Tier = 240 RPM

  • Enterprise Tier = Contact us

Higher rate limits are available, please reach out if you need an increase.
What are input and output tokens?

Tokens are units used to measure input and output text for LLMs. 1,000 tokens is about 750 words. Input tokens measure tokens in the input prompt (including context information). Output tokens are generated by the model.

How is RAG implemented?

There are multiple ways in which customers can build a RAG application on OctoAI. OctoAI allows customers to run their choice of LLMs (like Llama 2 70B, Mixtral 8x7B, Mixtral 8x22B) and embedding models (like gte-large). With these primitives, customers can use their preferred vector database as the reference data store for their RAG application. OctoAI also supports integrations with popular LLM application development frameworks like LangChain, allowing the use of pre-built functions in LangChain to simplify their RAG application development. Lastly, OctoAI supports integrations into turnkey RAG frameworks like PineCone Canopy for customers to easily implement RAG with their data.

Is it possible to pre-define a prompt?

All our Text Gen Solution code samples do have system prompts included, like: "role": "system", "content": "You are a helpful assistant." It should be noted that Mistral models do not support system prompts out of the box.

Start building with ease in minutes using OctoAI

We enable users to harness the value from AI innovations to build the next generation of intelligent applications. Sign up and enjoy the freedom to choose your model, infrastructure, and deployment templates.

Sign Up Today
Talk to sales