Added

July 26, 2023

OctoAI added several new things including better graceful concurrency handling, updated Python SDK, and diarization to Whisper model template.

  • Added more graceful concurrency handling: when users send more than N concurrent request to an endpoint with N replicas actively running, we will queue all extra requests instead of failing them. This queuing behavior has been activated for selected customers, and will be gradually rolled out over this week and next week. You will temporarily see a new replica spin up while the rollout is occurring on your endpoint.

  • Updated our Python SDK from 0.1.2 to 0.2.0—it now support both streaming and async inference requests.

  • Added diarization to our Whisper template endpoint and rectified the list of languages supported. Diarization enables use cases where you’d like to identify the speaker of each segment in a speech recording. You can view the full API specs in the Whisper demo template. Here’s an example of how to use the template with diarization:

    import requests
    import base64

    def download_file(url, filename):
        response = requests.get(url)
        if response.status_code == 200:
            with open(filename, "wb") as f:
                f.write(response.content)
            print(f"File downloaded successfully as {filename}.")
        else:
            print(f"Failed to download the file. Status code: {response.status_code}")


    def make_post_request(filename):
        with open(filename, "rb") as f:
            encoded_audio = base64.b64encode(f.read()).decode("utf-8")

        headers = {
            "Content-Type": "application/json"
        }
        data = {
            "audio": encoded_audio,
            "task": "transcribe",
            "diarize": True
        }

        response = requests.post("https://whisper-demo-kk0powt97tmb.octoai.cloud/predict", json=data, headers=headers)

        if response.status_code == 200:
            # Handle the successful response here
            json_response = response.json()

            for seg in json_response["response"]["segments"]:
                print(seg)

        else:
            print(f"Request failed with status code: {response.status_code}")

    if __name__ == "__main__":
        url = "<YOUR_FILE_HERE>.wav"
        filename = "sample.wav"

        download_file(url, filename)

        make_post_request(filename)
Improved

July 20, 2023

Added an OctoAI template for Llama2-7B Chat.

  • Added an OctoAI template for Llama2-7B Chat, which is an instruction-tuned model for chatbots. Users can now work with this brand-new to the market LLM directly in the web UI with limited token response or programmatically with additional optionality. A similar template for Llama2-70B is coming soon!
Fixed

July 18, 2023

Changed the HTTP status code to 201 for the REST API calls for create secret and create registry credentials. Previously, we returned 200 for these calls.

  • Changed the HTTP status code to 201 for the REST API calls for create secret and create registry credentials. Previously, we returned 200 for these calls. The behavior of the SDK and web frontend is not affected.

Was this page helpful?