RAG Chatbot

Retrieval-Augmented Generation System

What is a RAG Chatbot?

Retrieve

Finds relevant info from files (PDFs, CSVs, docs)

Augment

Adds that info to the prompt

Generate

AI gives an accurate answer

A RAG chatbot (Retrieval-Augmented Generation chatbot) is an AI chatbot that first retrieves relevant information from external sources like documents, PDFs, databases, or files and then uses that retrieved information to generate an accurate and context-aware response, instead of relying only on its built-in knowledge, which helps reduce wrong answers and allows the chatbot to answer questions based on your own or up-to-date data.

Why do we need RAG?

Private Data

Standard AIs don't know your private emails or company PDFs. RAG allows you to chat with your own private files safely.

Accuracy

AI often "hallucinates" (lies). By forcing it to read from a document first, RAG drastically reduces fake answers.

Up-to-Date

Retraining AI models takes months. With RAG, you just upload a new PDF, and the bot knows the new info instantly.

Cost Effective

It is much cheaper to run a RAG system than to fine-tune a massive custom model.

Controlling the AI (The Prompts)

We use specific instructions in the code to control how the AI behaves. Here are the 4 main prompts used in main.py.

1. System Prompt (The Persona)

                        SYSTEM_PROMPT = (

                            "You are a precise, pragmatic assistant.
                            You refine user queries to maximize retrieval quality, maintain factual
                            grounding..."

                        )
                    

Meaning: This sets the personality. We tell it to be "precise" and "pragmatic" so it doesn't act silly.

2. Developer Prompt (The Rules)

                        DEVELOPER_PROMPT = (

                            "First, refine the user’s raw query...
                            Then answer using only the retrieved chunks. Prefer technical clarity..."

                        )

Meaning: This gives strict rules. "Use only retrieved chunks" ensures it doesn't make things up.

Controlling the AI (The Prompts)

We use specific instructions in the code to control how the AI behaves. Here are the 4 main prompts used in main.py.

3. Refine Template (The Optimizer)

                        REFINE_TEMPLATE = (

                            "Refine the following query to maximize
                            retrieval quality from a technical corpus...\nRaw Query:\n{query}"

                        )

Meaning: This asks the AI to rewrite a bad question (e.g., "it broken") into a good search query (e.g., "System error troubleshooting").

4. Answer Template (The Final Assembly)

                        ANSWER_TEMPLATE = (

                            "User Query: {query}\nRefined Query:
                            {refined}\nContext Chunks:\n{context}\nTask: Provide a direct answer..."

                        )

Meaning: This pastes the user's question AND the data found in the database together, telling the AI to combine them.

LangChain: The Data Splitter

What is LangChain?

LangChain is a tool that helps you build AI applications by connecting an AI model with your data, such as PDFs, text files, or databases. Since AI models cannot read very large files at once because of their limited context window, LangChain splits big documents into small pieces (chunks) so the AI can read and understand them step by step and answer questions correctly.

Terminal

                            Ready... Click "Run" code above.
                        

langchain_demo.py

                            import os

                            import sys

                            from pathlib import Path

                            from langchain_community.document_loaders import PyPDFLoader,
                            TextLoader

                            from langchain_text_splitters import RecursiveCharacterTextSplitter

                            # Ensure you have your environment variables set or load them
                                here

                            # from dotenv import load_dotenv

                            # load_dotenv()

                            def main():

                                print("--- LangChain Educational Demo: Loader & Splitter
                                ---")

                                # 1. Get input file path from
                                user

                                file_path_str = input("Enter the absolute path to your file (PDF or TXT):
                                ").strip()

                                # Remove quotes if user added
                                them

                                if file_path_str.startswith('"')
                            and file_path_str.endswith('"'):

                                    file_path_str = file_path_str[1:-1]

                                file_path = Path(file_path_str)

                                if not file_path.exists():

                                    print(f"Error: File not
                                found at {file_path}")

                                    return

                                # 2. Load the document

                                print(f"\n[1] Loading file: {file_path.name}...")

                                docs = []

                                try:

                                    if
                            file_path.suffix.lower() == ".pdf":

                                        loader = PyPDFLoader(str(file_path))

                                        docs.extend(loader.load())

                                    elif
                            file_path.suffix.lower() in [".txt", ".md", ".log"]:

                                        loader = TextLoader(str(file_path), encoding="utf-8")

                                        docs.extend(loader.load())

                                    else:

                                        print("Error: Unsupported
                                file type. Please use .pdf, .txt, .md, or .log")

                                        return

                                except Exception as e:

                                    print(f"Error loading file:
                                {e}")

                                    return

                                print(f" -> Successfully loaded {len(docs)}
                                page(s)/document(s).")

                                # 3. Split the document
                                (Chunking)

                                print("\n[2] Splitting text into chunks...")

                                splitter = RecursiveCharacterTextSplitter(

                                    chunk_size=1000,

                                    chunk_overlap=150,

                                    separators=["\n\n", "\n", " ", ""]

                                )

                                chunks = splitter.split_documents(docs)

                                print(f" -> Created {len(chunks)} chunks.")

                                # 4. Print Chunks

                                print("\n[3] Printing Chunks (Preview first 3):")

                                print("="*60)

                                for i, chunk in enumerate(chunks[:3]):

                                    print(f"CHUNK
                                {i+1}:")

                                    print(f"Metadata:
                                {chunk.metadata}")

                                    print(f"Content Length:
                                {len(chunk.page_content)}")

                                    print("-" * 20)

                                    print(chunk.page_content[:500] + "..." if len(chunk.page_content) > 500 else
                            chunk.page_content)

                                    print("="*60)

                                if len(chunks) > 3:

                                    print(f"... and
                                {len(chunks) - 3} more chunks.")

                            if __name__ == "__main__":

                                main()

Chroma DB: The Semantic Search

What is ChromaDB?

ChromaDB is a vector database used to store and search text by meaning, not by exact words, which helps AI systems quickly find the most relevant pieces of information when answering questions.

In simple terms:
ChromaDB saves your document chunks as embeddings (numbers) and, when you ask a question, it finds the most similar chunks so the AI can use them to give an accurate answer.

Terminal

                            Ready... Click "Run" code above.
                        

chroma_demo.py

                            import os

                            import sys

                            from dotenv import
                            load_dotenv

                            from langchain_chroma import Chroma

                            from langchain_community.embeddings import SentenceTransformerEmbeddings

                            from langchain_core.documents import Document

                            # Load env for proper setup if needed, though local embeddings
                                might not need API keys

                            load_dotenv()

                            def main():

                                print("--- ChromaDB Educational Demo: Embedding & Search
                                ---")

                                print("(Note: This runs in-memory and does NOT save to
                                disk)")

                                # 1. Setup Embedding Model

                                model_name = "sentence-transformers/all-MiniLM-L6-v2"

                                print(f"\n[1] Initializing Embedding Model
                                ({model_name})...")

                                embeddings = SentenceTransformerEmbeddings(model_name=model_name)

                                # Initialize ephemeral Chroma (no
                                persist_directory)

                                vectorstore = Chroma(

                                    collection_name="demo_collection",

                                    embedding_function=embeddings,

                                    #
                                persist_directory=None # Passing None typically implies in-memory/ephemeral in newer
                                versions or default

                                )

                                # 2. Get Data Input

                                print("\n[2] Enter texts to add to the database (type 'DONE' on a new
                                line to finish):")

                                texts_to_add = []

                                while True:

                                    line = input(f"Text
                                {len(texts_to_add)+1}: ")

                                    if
                            line.strip().upper() == 'DONE':

                                        break

                                    if
                            line.strip():

                                        texts_to_add.append(line)

                                if not texts_to_add:

                                    print("No text provided.
                                Exiting.")

                                    return

                                # 3. Embed and Add to
                                Chroma

                                print(f"\n[3] Embedding and adding {len(texts_to_add)} texts to Vector
                                Store...")

                                # Wrap in Document objects as typical
                                RAG would

                                docs = [Document(page_content=t, metadata={"id": i}) for i, t
                            in enumerate(texts_to_add)]

                                vectorstore.add_documents(docs)

                                print(" -> Done.")

                                # 4. Search

                                while True:

                                    query = input("\n[4] Enter a query
                                to search (or 'quit' to exit): ").strip()

                                    if
                            query.lower() in
                            ['quit', 'exit']:

                                        break

                                    print(f" Searching for:
                                '{query}'")

                                    # Perform
                                similarity search

                                    results = vectorstore.similarity_search_with_score(query, k=2)

                                    print(f" -> Found
                                {len(results)} matches:\n")

                                    for
                            doc, score in results:

                                        print(f" * Score:
                                {score:.4f}")

                                        print(f" * Content:
                                {doc.page_content}")

                                        print(" " + "-"*30)

                            if __name__ == "__main__":

                                main()

FastAPI: The Data Doorway

What is it?

FastAPI is a tool for building websites and APIs. It creates the "Doors" (Endpoints) that allow users to send files or messages to our Python code.

How Code uses it?

It defines the /upload door (for PDFs) and the /query door (for questions). It also handles errors.

main.py (FastAPI)

                            from fastapi import
                            FastAPI, UploadFile

                            app = FastAPI()

                            # Endpoint 1: File Upload

                            @app.post("/upload")

                            async def upload_file(file: UploadFile):

                                return {"filename": file.filename, "status": "Ingested"}

                            # Endpoint 2: Query

                            @app.post("/query")

                            async def query_endpoint(payload: QueryRequest):

                                answer = rag_system.ask(payload.question)

                                return {"answer": answer}

Groq API: The Speed Engine

What is it?

Groq is the Engine. It runs the smart AI model (Llama-3). It is famous for being incredibly fast.

How Code uses it?

We use it to Generate the final answer by sending the user's question + the best chunks from ChromaDB.

groq_demo.py

                            import os

                            import sys

                            from dotenv import
                            load_dotenv

                            from langchain_groq import ChatGroq

                            from langchain_core.prompts import ChatPromptTemplate

                            load_dotenv()

                            def main():

                                print("--- Groq API Educational Demo: LLM Interaction
                                ---")

                                api_key = os.getenv("GROQ_API_KEY")

                                if not api_key:

                                    print("Error: GROQ_API_KEY
                                not found in .env file.")

                                    return

                                model_name = os.getenv("GROQ_MODEL", "llama-3.3-70b-versatile")

                                # 1. Initialize ChatGroq

                                print(f"\n[1] Initializing Groq Client (Model:
                                {model_name})...")

                                llm = ChatGroq(

                                    api_key=api_key,

                                    model=model_name,

                                    temperature=0.7

                                )

                                # 2. Interactive Loop

                                print("\n[2] Start Chatting (type 'quit' to exit)")

                                # We can use a simple prompt
                                template

                                system_prompt = "You are a helpful
                                assistant explaining concepts clearly."

                                while True:

                                    user_input = input("\nYou:
                                ").strip()

                                    if
                            user_input.lower() in ['quit', 'exit']:

                                        break

                                    if not
                            user_input:

                                        continue

                                    print("Groq: Generating
                                response...", end="\r")

                                    try:

                                        # invocating the model

                                        messages = [

                                            ("system", system_prompt),

                                            ("human", user_input),

                                        ]

                                        response = llm.invoke(messages)

                                        # Clear the loading line

                                        print(" " * 30, end="\r")

                                        print(f"Groq:
                                {response.content}")

                                    except
                            Exception as
                            e:

                                        print(f"\nError interacting
                                with Groq: {e}")

                            if __name__ == "__main__":

                                main()

Limitations of RAG

Even though RAG is powerful, it is not perfect.

Garbage In, Garbage Out

If your uploaded PDF is blurry, confusing, or wrong, the AI's answer will also be wrong. It cannot fix bad data.

Latency

Because it has to Search Database -> Refine -> Generate, it is slower than just asking standard ChatGPT.

Context Limits

We can only feed a few chunks (Top 3) to the AI. If the answer requires reading the entire book at once, RAG might miss it.

System Architecture

User

Frontend Form
Input: Query + Data (PDF/Txt/Md)

FastAPI Gateway
Route: /upload & /query

Phase 1: Ingest (Data)

LangChain
Chunking & Splitting

Chroma DB
Save Vectors

Phase 2: Query (Text)

Groq API
Enhance/Refine Query

Chroma DB
Compare & Extract Top Chunks

Groq API
Generate Answer (Enhanced Q + Chunks)

Final Answer

Thank You!

Any Questions?

RAG Architecture v1.1.0

🚀 RAG Chatbot Setup Guide

Follow these simple steps to get your own AI Chatbot running with custom document upload support.

1. Project Setup

First, create a folder for your project (e.g., MyChatbot) and verify you have the necessary files.

Create a new folder on your computer.
Download/Move the main.py file into this folder.

2. Create Virtual Environment

This isolates your libraries from other projects to prevent conflicts.

Windows:

python -m venv venv venv\Scripts\activate

Mac/Linux:

python3 -m venv venv source venv/bin/activate

Tip: You should see (venv) at the start of your terminal prompt line now.

3. Install Dependencies

You need to install the required libraries. Create a file named requirements.txt with the following content, or just copy-paste the install command below if you want to skip creating the file.

Dependencies (requirements.txt):

langchain>=0.2.0 langchain-community>=0.2.0 langchain-chroma>=0.1.0 chromadb>=0.5.0 pypdf>=4.0.0 loguru>=0.7.0 langchain-huggingface>=0.0.3 openai>=1.0.0 tiktoken>=0.7.0 sentence-transformers>=2.2.2 numpy>=1.24.0 scipy>=1.11.0 fastapi>=0.110.0 uvicorn[standard]>=0.29.0 python-multipart>=0.0.9 python-dotenv>=1.0.0 pydantic>=2.7.0 groq>=0.9.0 langchain-groq>=0.1.0

Install Command:

pip install -r requirements.txt

4. Get API Keys & Configure

This chatbot relies on Groq for the AI intelligence.

1. Get Groq API Key

Go to console.groq.com/keys, create a free account, and generate a new API Key. Copy it immediately.

2. Create .env File

This file keeps your secrets safe. Create a new file simply named .env (no extension) in your project folder and paste the following content:

GROQ_API_KEY=gsk_your_actual_key_here_xxxxxxxxxxxx GROQ_MODEL=llama-3.3-70b-versatile CHROMA_DIR=./chroma_db CHROMA_COLLECTION=rag_collection EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

Replace gsk_your_actual_key_here... with the key you copied from Groq.

5. Run the Application

Everything is set! Now run the following command in your terminal:

uvicorn main:app --reload

Wait for the logs to say Application startup complete.

6. How to Use

Once running, open your browser and visit:

http://127.0.0.1:8000/docs

To Upload Data:
1. Click POST /upload -> Try it out.
2. Click Choose File to select a PDF/Text file.
3. Click Execute. Wait for the success message confirming chunks were added.
To Chat:
1. Click POST /query -> Try it out.
2. Edit the JSON to ask your question: { "query": "What is in the file I uploaded?" }
3. Click Execute to see the AI's answer!

Architecture Reference

RAG Chatbot

What is a RAG Chatbot?

Retrieve

Augment

Generate

Why do we need RAG?

Private Data

Accuracy

Up-to-Date

Cost Effective

Controlling the AI (The Prompts)

1. System Prompt (The Persona)

2. Developer Prompt (The Rules)

Controlling the AI (The Prompts)

3. Refine Template (The Optimizer)

4. Answer Template (The Final Assembly)

LangChain: The Data Splitter

What is LangChain?

Chroma DB: The Semantic Search

What is ChromaDB?

FastAPI: The Data Doorway

What is it?

How Code uses it?

Groq API: The Speed Engine

What is it?

How Code uses it?

Limitations of RAG

Garbage In, Garbage Out

Latency

Context Limits

System Architecture

Thank You!

🚀 RAG Chatbot Setup Guide

1. Project Setup

2. Create Virtual Environment

Windows:

Mac/Linux:

3. Install Dependencies

4. Get API Keys & Configure

1. Get Groq API Key

2. Create .env File

5. Run the Application

6. How to Use

http://127.0.0.1:8000/docs