New

Learn about the optimization techniques powering OctoStack's 10x performance boost

Pricing & Plans

Get started today on OctoAI and receive $10 of free credit in your account.

Overview Text Gen Solution Media Gen Solution

Text Gen Solution

The $10 credit is the equivalent of over 500,000 words with the largest Llama 2 70B model, and over a million words with the new Mixtral 8x7B model.

OctoAI’s unified API endpoint to build on your choice of open source LLMs and variants.

New

Model Remix Credits

We're giving away up to 150x bonus credits for our brand new Text Gen Solution on top of our industry-leading cost-per-token. Requires certain spend or commit to spend.

See detailed pricing

Features

Free Trial

$10

Free credit upon sign up

Get started building your project

GTE Large

Bring your Fine Tune

Fine-tuning

Bring your choice of checkpoints

Committed use discounts

Performance optimization options

Contractual SLAs

Dedicated Customer Success Manager

Option for private deployment

See detailed pricing

Features

Free Trial

$10

Free credit upon sign up

Get started building your project

Pro

$0.15

Per 1M tokens for 7B and 8B models

$1.20

Per 1M tokens for 8x22B models

Enterprise

Contact Us

Bring your own checkpoint

GTE Large

Bring your Fine Tune

Fine-tuning

Bring your choice of checkpoints

Committed use discounts

Performance optimization options

Contractual SLAs

Dedicated Customer Success Manager

Option for private deployment

Frequently asked questions

Don’t see the answer to your question here? Feel free to reach out so we can help.

What are your rate limits for the Text Gen Solution?

The rate limits are as follows:

Free Tier = 10 RPM
Pro Tier = 240 RPM
Enterprise Tier = Contact us

Higher rate limits are available, please reach out if you need an increase.

What are input and output tokens?

Tokens are units used to measure input and output text for LLMs. 1,000 tokens is about 750 words. Input tokens measure tokens in the input prompt (including context information). Output tokens are generated by the model.

How is RAG implemented?

There are multiple ways in which customers can build a RAG application on OctoAI. OctoAI allows customers to run their choice of LLMs (like Llama 2 70B, Mixtral 8x7B, Mixtral 8x22B) and embedding models (like gte-large). With these primitives, customers can use their preferred vector database as the reference data store for their RAG application. OctoAI also supports integrations with popular LLM application development frameworks like LangChain, allowing the use of pre-built functions in LangChain to simplify their RAG application development. Lastly, OctoAI supports integrations into turnkey RAG frameworks like PineCone Canopy for customers to easily implement RAG with their data.

Is it possible to pre-define a prompt?

All our Text Gen Solution code samples do have system prompts included, like: "role": "system", "content": "You are a helpful assistant." It should be noted that Mistral models do not support system prompts out of the box.

Start building with ease in minutes using OctoAI

We enable users to harness the value from AI innovations to build the next generation of intelligent applications. Sign up and enjoy the freedom to choose your model, infrastructure, and deployment templates.

Talk to sales