Multi-Tool Orchestration with RAG approach using OpenAI’s Responses API
This cookbook guides you through building dynamic, multi-tool workflows using OpenAI’s Responses API. It demonstrates how to implement a Retrieval-Augmented Generation (RAG) approach that intelligently routes user queries to the appropriate in-built or external tools. Whether your query calls for general knowledge or requires accessing specific internal context from a vector database (like Pinecone), this guide shows you how to integrate function calls, web searches in-built tool, and leverage document retrieval to generate accurate, context-aware responses.Installation
Create a Pinecone Index Based on the Dataset
Use the dataset itself to determine the embedding dimensionality. For example, compute one embedding from the merged column and then create the index accordingly.Upsert the Dataset into Pinecone index
Process the dataset in batches, generate embeddings for each merged text, prepare metadata (including separate Question and Answer fields), and upsert each batch into the index. You may also update metadata for specific entries if needed.Query the Pinecone Index
Create a natural language query, compute its embedding, and perform a similarity search on the Pinecone index. The returned results include metadata that provides context for generating answers.Generate a Response Using the Retrieved Context
Select the best matching result from your query results and use the OpenAI Responses API to generate a final answer by combining the retrieved context with the original question.Orchestrate Multi-Tool Calls
Now, we’ll define the built-in function available through the Responses API, including the ability to invoke the external Vector Store - Pinecone as an example. Web Search Preview Tool: Enables the model to perform live web searches and preview the results. This is ideal for retrieving real-time or up-to-date information from the internet. Pinecone Search Tool: Allows the model to query a vector database using semantic search. This is especially useful for retrieving relevant documents—such as medical literature or other domain-specific content—that have been stored in a vectorized format.query_pinecone_index
with the current query and then extracts the best match (or an appropriate context) as the result. For non health related inqueries or queries where explicit internet search is asked, the code calls the web_search_call function and for other queries, it may choose to not call any tool and rather provide a response based on the question under consideration.
Finally, the tool call and its output are appended to the conversation, and the final answer is generated by the Responses API.