Sign up
Log in
Sign up
Log in
New
Learn about the optimization techniques powering OctoStack's 10x performance boost
See how
Home
Blog

OctoAI and Pinecone partnership for GenAI using RAG

Blog Author - Bear Douglas
Blog Author - Pedro Torruella

Apr 1, 2024

2 minutes

Share

We’re happy to announce that OctoAI and Pinecone are partnering to lower the barrier of entry for developers to build robust and performant GenAI applications with Retrieval Augmented Generation (RAG). The solution: Pinecone’s flexible RAG framework, Canopy now includes pre-integrated access to OctoAI’s production-ready embedding and LLM API endpoints.

Pinecone’s Canopy is an open-source framework built on top of Pinecone’s vector database to build production-ready chat assistants at any scale. Canopy makes it easy for developers to implement a RAG workflow, by lifting all of the low level implementation decisions (e.g. chunk size, index configuration, LLM, embedding etc.) into easy to modify YAML configuration files.

OctoAI has partnered with Pinecone to bring its production-ready endpoints to power Canopy’s text embedding and augmented generation needs. As a developer using Canopy you can choose from the best models that OctoAI has to offer, including GTE Large embedding model, the best foundational open source LLMs such as Mixtral-8x7B from Mistral AI, Llama2 from Meta, and highly capable model fine tunes like Nous Hermes 2 Pro Mistral from Nous Research.

To get a Canopy server up and running with OctoAI’s models, you don’t need to add any custom code, just update the Canopy YAML configuration as follows:

chat_engine:
  params:
  max_prompt_tokens: 2048
  llm: &llm
  type: OctoAILLM
  params:
    model_name: mistral-7b-instruct

  context_engine:
  knowledge_base:
    record_encoder:
      type: OctoAIRecordEncoder
      params:
        model_name: thenlper/gte-large
        batch_size: 2048

As a fully open source solution, Canopy+OctoAI is one of the fastest ways and more affordable ways to get started on your RAG journey. Canopy uses Pinecone vector database for storage and retrieval, which is free to use for up to 100k vectors (that’s about 30k pages of text). OctoAI offers industry leading pricing at $0.05 / 1M token for its gte-large embedding model, and $0.30/1M input token, and $0.50/1M token pricing for its Mixtral-8x7B text completion model. Upon signing up to OctoAI you are given $10 in free credits, which lets you easily populate 100k vectors with room to generate several million of tokens in augmented generation.