Retrieval-Augmented Generation System
Finds relevant info from files (PDFs, CSVs, docs)
Adds that info to the prompt
AI gives an accurate answer
A RAG chatbot (Retrieval-Augmented Generation chatbot) is an AI chatbot that first retrieves relevant information from external sources like documents, PDFs, databases, or files and then uses that retrieved information to generate an accurate and context-aware response, instead of relying only on its built-in knowledge, which helps reduce wrong answers and allows the chatbot to answer questions based on your own or up-to-date data.
Standard AIs don't know your private emails or company PDFs. RAG allows you to chat with your own private files safely.
AI often "hallucinates" (lies). By forcing it to read from a document first, RAG drastically reduces fake answers.
Retraining AI models takes months. With RAG, you just upload a new PDF, and the bot knows the new info instantly.
It is much cheaper to run a RAG system than to fine-tune a massive custom model.
We use specific instructions in the
code to control how the AI behaves. Here are the 4 main prompts used in main.py.
Meaning: This sets the personality. We tell it to be "precise" and "pragmatic" so it doesn't act silly.
Meaning: This gives strict rules. "Use only retrieved chunks" ensures it doesn't make things up.
We use specific instructions in the
code to control how the AI behaves. Here are the 4 main prompts used in main.py.
Meaning: This asks the AI to rewrite a bad question (e.g., "it broken") into a good search query (e.g., "System error troubleshooting").
Meaning: This pastes the user's question AND the data found in the database together, telling the AI to combine them.
LangChain is a tool that helps you build AI applications by connecting an AI model with your data, such as PDFs, text files, or databases. Since AI models cannot read very large files at once because of their limited context window, LangChain splits big documents into small pieces (chunks) so the AI can read and understand them step by step and answer questions correctly.
ChromaDB is a vector database used to store and search text by meaning, not by exact words, which helps AI systems quickly find the most relevant pieces of information when answering questions.
In simple terms:
ChromaDB saves your document chunks as embeddings
(numbers) and, when you ask a question, it finds the most similar chunks so the AI can use
them to give an accurate answer.
FastAPI is a tool for building websites and APIs. It creates the "Doors" (Endpoints) that allow users to send files or messages to our Python code.
It defines the /upload door (for PDFs) and the
/query door (for questions). It also handles errors.
Groq is the Engine. It runs the smart AI model (Llama-3). It is famous for being incredibly fast.
We use it to Generate the final answer by sending the user's question + the best chunks from ChromaDB.
Even though RAG is powerful, it is not perfect.
If your uploaded PDF is blurry, confusing, or wrong, the AI's answer will also be wrong. It cannot fix bad data.
Because it has to Search Database -> Refine -> Generate, it is slower than just asking standard ChatGPT.
We can only feed a few chunks (Top 3) to the AI. If the answer requires reading the entire book at once, RAG might miss it.
Any Questions?
RAG Architecture v1.1.0
Follow these simple steps to get your own AI Chatbot running with custom document upload support.
First, create a folder for your project (e.g., MyChatbot) and verify you have the necessary
files.
main.py file into this folder.This isolates your libraries from other projects to prevent conflicts.
(venv) at the start of your terminal
prompt
line now.You need to install the required libraries. Create a file named requirements.txt with the
following content, or just copy-paste the install command below if you want to skip creating the file.
This chatbot relies on Groq for the AI intelligence.
Go to console.groq.com/keys, create a free account, and generate a new API Key. Copy it immediately.
This file keeps your secrets safe. Create a new file simply named .env (no extension) in
your
project folder and paste the following content:
Replace gsk_your_actual_key_here... with the key you copied from Groq.
Everything is set! Now run the following command in your terminal:
Wait for the logs to say Application startup complete.
Once running, open your browser and visit:
POST /upload -> Try it out.POST /query -> Try it out.{ "query": "What is in the file I uploaded?" }