RAG Explained - Building a RAG using Python, LangChain, Ollama and Groq
The Perigon Team
Aug 21, 2024
Introduction
With recent AI advancements and the introduction and daily evolution of new large language models, AI significantly influences how we interact, consume information, work, and search for information online. However, like all technologies, AI also has limitations that hinder us from fully harnessing its potential.
LLM limitations
Large language models are fascinating, trained on massive data, and know almost anything. The main two problems of LLMs are:
They do not have access to real-time information.
They are prone to hallucinations.
When we want to introduce a new knowledge base to an LLM, to make it up-to-date, we have to fine-tune or train the model from scratch with the latest data, and both of these methods require a lot of resources and time.
On the other hand, hallucination is a side-effect of when the LLM doesn’t have the knowledge or when the training data is biased to answer your prompt, so it starts making things up that are either false or do not exist, which can lead to the generation of false or misleading information. You can learn more about AI hallucinations from DigitalOcean by reading this post.
Introducing RAG
RAG stands for Retrieval-Augmented Generation. It's an artificial intelligence technique used to solve two previous problems of LLMs. RAG can be added to any LLM (ChatGPT, LLaMa, Mistral…), taking its simple architecture of providing the missing context by looking at a vector database and feeding the context to the LLM through a specific prompt template.
We've recently created a new YouTube video explaining how RAG works and how to build a simple RAG using LangChain. You can watch it from here.
How RAG Works?
In simple terms, RAG is an architecture you add to your LLM pipeline to provide the model with the proper up-to-date context (fetched from your database) to answer the user’s prompt correctly.
Depending on how your current pipeline is structured, where you fetch the up-to-date context data, and how you process it, you can build an RAG with many architectures and variations.
Let’s look at a simple RAG architecture diagram from when the user sends a prompt to receive a response.
Here is the diagram explained in simple steps:
The user sends a request/prompt to our servers.
The prompt passes through the RAG implementation for semantic searching of the context database, where the up-to-date data resides. A vector database is used for the semantic search.
The prompt and retrieved context are provided to the LLM using a specific template to ensure optimal response and utilization of the provided context, allowing the LLM to respond accurately to the prompt.
The LLM response is returned to the user (or might be used for another processing task).
Building a RAG with LangChain
LangChain is a framework for developing and deploying applications powered by large language models (LLMs). It streamlines development with open-source tools, simplifies productionization through monitoring and evaluation, and facilitates deployment by turning any chain into an API using LangServe.
The LangChain framework will help us assemble the pieces to build our RAG architecture.
Prerequisites
Before starting, ensure you have the required libraries installed. Install them using the following commands:
If this is your first time using Playwright on your machine, run the following command to install a Chromium-compatible browser:
Step 1: Prepare the Environment
First, we must load the environment variables from a .env
file. This file should contain your API keys for services like OpenAI, Perigon, and Groq.
Step 2: Load Data
Load data from the web
Let's grab the data from the web using Playwright and transform it into a format suitable for processing. This step is optional since you should already have the data you want to use as a context for feeding it into the LLM. This is just a simple example of fetching the latest news and data.
We want to fetch the latest information about the upcoming (June 1, 2024) UEFA Champions League final match for Borussia Dortmund vs. Real Madrid.
Load data using an API
Also, you can fetch articles via an API provider like Perigon.
The Perigon API allows you to fetch up-to-date, real-time news and article data powered by AI and contextual intelligence. In this case, we can easily query using the q
field for Dortmund> vs Real Madrid 2024
to get the latest articles. Then, we pre-process and convert them into LangChain-compatible documents ready for embedding.
Please provide your valid Perigon API key to access the API. You can find your API key in the dashboard.
Step 3: Embed Documents into the Vector Database
Embedding is a technique used in natural language processing (NLP) and machine learning to convert textual data into numerical representations. These representations capture the semantic meaning and context of the text.
We use a vector database like Qdrant to store embedded documents as vectors, which makes it easier for the RAG to fetch the appropriate context for the received prompt.
We take all the data documents we pre-processed before and embed them into our Qdrant database.
Step 4: Initialize the LLM Model of choice
Choose an LLM model, such as LLaMa (e.g., using Ollama), Groq LLaMa, or OpenAI.
If you prefer running your local LLM model, Ollama is recommended for quickly downloading and serving many open-source LLMs (e.g., LLaMa 3). For example, we are pointing Ollama to use the llama3-8b-instruct
model, which is fine-tuned for instructions.
Step 5: Testing Prompt without RAG
Attempt to generate a response without using RAG.
from langchain.prompts import ChatPromptTemplate system = "You are a helpful assistant."human = "{text}"prompt = ChatPromptTemplate.from_messages([("system", system), ("human", human)]) chain = prompt | groq_llama3_llmquery = "Where to watch, match date and possible line-ups for Dortmund vs Real finale" for s in chain.stream(query): print(s.content, end="", flush=True)
This is the response that you would get.
The highly anticipated UEFA Champions League Final between Borussia Dortmund and Real Madrid!
Here's the essential information you need to know:
Match Details:
Date: Saturday, May 28, 2023
Kick-off time: 21:00 CEST (Central European Summer Time)
Venue: Atatürk Olympic Stadium, Istanbul, Turkey
Where to Watch:
The UEFA Champions League Final will be broadcast in many countries worldwide. You can check the official UEFA website for a list of broadcasters and streaming options in your region.
In the United States, you can watch the match on CBS Sports Network and stream it on Paramount+. In the UK, it will be broadcast on BT Sport 2 and streamed on the BT Sport app.
Possible Line-ups:
Please note that these line-ups are subject to change and may not reflect the actual starting XI for the match. Here are some possible line-ups for both teams:
Borussia Dortmund:
Goalkeeper: Gregor Kobel
Defenders: Nico Schulz, Mats Hummels, Manuel Akanji, Thomas Meunier
Midfielders: Jude Bellingham, Thomas Delaney, Axel Witsel, Julian Brandt
Real Madrid:
Defenders: Dani Carvajal, Éder Militão, Raphaël Varane, Ferland Mendy
Midfielders: Casemiro, Luka Modrić, Toni Kroos, Federico Valverde
Forwards: Karim Benzema, Vinícius Júnior, Rodrygo Goes
If you look at the official UEFA article, you’ll know that everything the LLM stated above is wrong and outdated (since it doesn’t have the latest knowledge).
Step 6: Preparing Prompt Template for RAG
To provide the fetched context regarding the user’s prompt, we must prepare a prompt template that includes the main instruction, the retrieved RAG context, and the user’s original question (prompt).
Step 7: Testing Prompt with RAG
Now, let’s see our RAG implementation in action, giving it the same prompt as before, asking about Where to watch, match date, and possible line-ups for Dortmund vs Real Madrid final
And yes, as you may have already guessed, the LLM returned the correct, up-to-date answer about the match:
According to the article, here are the answers to your question:
Where to watch the Champions League final: Fans can find their local UEFA Champions League broadcast partner(s) here.
Match date: Saturday 1 June.
Possible line-ups:
Dortmund: Kobel; Ryerson, Hummels, Schlotterbeck, Maatsen; Can, Sabitzer; Adeyemi, Brandt, Sancho; Füllkrug
Real Madrid: Courtois; Carvajal, Éder Militão, Rüdiger, Mendy; Valverde, Kroos; Camavinga; Bellingham; Rodrygo, Vinícius Júnior
If you double-check the official UEFA article, you’ll notice that everything is correct, from the match date to the possible line-ups 😎
RAG is amazing, isn’t it?
Project Source Code
You can access the source code of the implemented RAG above in this repo.
Conclusion
Following these steps, you should have a working RAG system that augments your LLM's question-answering capabilities with relevant context retrieved.
Please feel free to explore and expand upon this foundation to develop more sophisticated information retrieval and generation systems!