Text Gen Python SDK
Use the OctoAI Chat Completion API to easily generate text.
For a quick glance at the parameters supported by the Chat Completions API, see the API reference.
The Client
class allows you to run inferences simply to any model that accepts JSON-formatted inputs as a dictionary, and provides you with all JSON-formatted outputs as a dictionary. The Client
class also supports the Chat Completions API and provides easy access to a set of highly optimized text models on OctoAI.
This guide will walk you through how to select your model of interest, how to call highly optimized text models on OctoAI using the Chat Completions API, and how to use the responses in both streaming and regular modes.
Requirements
- Please create an OctoAI API token if you don’t have one already.
- Please also verify you’ve completed Python SDK Installation & Setup.
- If you use the
OCTOAI_TOKEN
envvar for your token, you can instantiate the OctoAI client withclient = Client()
after importing theoctoai
package.
- If you use the
Supported Models
The Client().chat.completions.create()
method described in the next section requires a model=
argument. The following snippet shows you how to get a list of supported models.
>>> import octoai
>>> octoai.chat.get_model_list()
The list of available models is also detailed in the API reference. You can specify the model=
argument either as a string or as a octoai.chat.TextModel
enum instance, such as TextModel.LLAMA2_70B
.
Text Generation
The following snippet shows you how to use the Chat Completions API to generate text using Llama2.
import json
from octoai.chat import TextModel
from octoai.client import Client
client = Client()
completion = client.chat.completions.create(
model=TextModel.LLAMA_2_70B_CHAT,
messages=[
{
"role": "system",
"content": "Below is an instruction that describes a task. Write a response that appropriately completes the request.",
},
{"role": "user", "content": "Write a blog about Seattle"},
],
max_tokens=150,
)
print(json.dumps(completion.dict(), indent=2))
The response is of type octoai.chat.ChatCompletion
. If you print the response from this call as in the example above, it looks similar to the following:
{
"id": "cmpl-8ea213aece0747aca6d0608b02b57196",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Founded in 1921, Seattle is the mother city of Pacific Northwest. Seattle is the densely populated second-largest city in the state of Washington along with Portland. A small city at heart, Seattle has transformed itself from a small manufacturing town to the contemporary Pacific Northwest hub to its east. The city's charm and frequent unpredictability draw tourists and residents alike. Here are my favorite things about Seattle.\n* Seattle has a low crime rate and high quality of life.\n* Seattle has rich history which included the building of the first Pacific Northwest harbor and the development of the Puget Sound irrigation system. Seattle is also home to legendary firm Boeing.\n",
"function_call": null
},
"delta": null,
"finish_reason": "length"
}
],
"created": 5399,
"model": "llama2-70b",
"object": "chat.completion",
"system_fingerprint": null,
"usage": {
"completion_tokens": 150,
"prompt_tokens": 571,
"total_tokens": 721
}
}
Note that billing is based upon “prompt tokens” and “completion tokens” above. View prices on our pricing page.
Streaming Responses
The following snippet shows you how to obtain the model’s response incrementally as it is generated using streaming (using stream=True
).
from octoai.chat import TextModel
from octoai.client import Client
client = Client()
for completion in client.chat.completions.create(
model=TextModel.LLAMA_2_70B_CHAT,
messages=[
{
"role": "system",
"content": "Below is an instruction that describes a task. Write a response that appropriately completes the request.",
},
{"role": "user", "content": "Write a blog about Seattle"},
],
max_tokens=150,
stream=True,
):
print(completion.choices[0].delta.content, end='', flush=True)
When using streaming mode, the response is of type Iterable[ChatCompletion]
. To read each incremental response from the model, you can use a for
loop over the returned object. The example above prints each incremental response as it arrives, and they accumulate to form the entire response in the output as the model prediction progresses.
Additional Parameters
To learn about the additional parameters supported by the Client().chat.completions.create()
method, see the API reference.
Was this page helpful?