RAG Chatbot

Retrieval-Augmented Generation System

What is a RAG Chatbot?

Retrieve

Finds relevant info from files (PDFs, CSVs, docs)

Augment

Adds that info to the prompt

Generate

AI gives an accurate answer

A RAG chatbot (Retrieval-Augmented Generation chatbot) is an AI chatbot that first retrieves relevant information from external sources like documents, PDFs, databases, or files and then uses that retrieved information to generate an accurate and context-aware response, instead of relying only on its built-in knowledge, which helps reduce wrong answers and allows the chatbot to answer questions based on your own or up-to-date data.

Why do we need RAG?

Private Data

Standard AIs don't know your private emails or company PDFs. RAG allows you to chat with your own private files safely.

Accuracy

AI often "hallucinates" (lies). By forcing it to read from a document first, RAG drastically reduces fake answers.

Up-to-Date

Retraining AI models takes months. With RAG, you just upload a new PDF, and the bot knows the new info instantly.

Cost Effective

It is much cheaper to run a RAG system than to fine-tune a massive custom model.

Controlling the AI (The Prompts)

We use specific instructions in the code to control how the AI behaves. Here are the 4 main prompts used in main.py.

1. System Prompt (The Persona)

SYSTEM_PROMPT = (
    "You are a precise, pragmatic assistant. You refine user queries to maximize retrieval quality, maintain factual grounding..."
)

Meaning: This sets the personality. We tell it to be "precise" and "pragmatic" so it doesn't act silly.

2. Developer Prompt (The Rules)

DEVELOPER_PROMPT = (
    "First, refine the user’s raw query... Then answer using only the retrieved chunks. Prefer technical clarity..."
)

Meaning: This gives strict rules. "Use only retrieved chunks" ensures it doesn't make things up.

Controlling the AI (The Prompts)

We use specific instructions in the code to control how the AI behaves. Here are the 4 main prompts used in main.py.

3. Refine Template (The Optimizer)

REFINE_TEMPLATE = (
    "Refine the following query to maximize retrieval quality from a technical corpus...\nRaw Query:\n{query}"
)

Meaning: This asks the AI to rewrite a bad question (e.g., "it broken") into a good search query (e.g., "System error troubleshooting").

4. Answer Template (The Final Assembly)

ANSWER_TEMPLATE = (
    "User Query: {query}\nRefined Query: {refined}\nContext Chunks:\n{context}\nTask: Provide a direct answer..."
)

Meaning: This pastes the user's question AND the data found in the database together, telling the AI to combine them.

LangChain: The Data Splitter

What is LangChain?

LangChain is a tool that helps you build AI applications by connecting an AI model with your data, such as PDFs, text files, or databases. Since AI models cannot read very large files at once because of their limited context window, LangChain splits big documents into small pieces (chunks) so the AI can read and understand them step by step and answer questions correctly.

Terminal
Ready... Click "Run" code above.
langchain_demo.py
import os
import sys
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Ensure you have your environment variables set or load them here
# from dotenv import load_dotenv
# load_dotenv()

def main():
    print("--- LangChain Educational Demo: Loader & Splitter ---")

    # 1. Get input file path from user
    file_path_str = input("Enter the absolute path to your file (PDF or TXT): ").strip()

    # Remove quotes if user added them
    if file_path_str.startswith('"') and file_path_str.endswith('"'):
        file_path_str = file_path_str[1:-1]

    file_path = Path(file_path_str)

    if not file_path.exists():
        print(f"Error: File not found at {file_path}")
        return

    # 2. Load the document
    print(f"\n[1] Loading file: {file_path.name}...")
    docs = []
    try:
        if file_path.suffix.lower() == ".pdf":
            loader = PyPDFLoader(str(file_path))
            docs.extend(loader.load())
        elif file_path.suffix.lower() in [".txt", ".md", ".log"]:
            loader = TextLoader(str(file_path), encoding="utf-8")
            docs.extend(loader.load())
        else:
            print("Error: Unsupported file type. Please use .pdf, .txt, .md, or .log")
            return
    except Exception as e:
        print(f"Error loading file: {e}")
        return

    print(f" -> Successfully loaded {len(docs)} page(s)/document(s).")

    # 3. Split the document (Chunking)
    print("\n[2] Splitting text into chunks...")
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=150,
        separators=["\n\n", "\n", " ", ""]
    )

    chunks = splitter.split_documents(docs)

    print(f" -> Created {len(chunks)} chunks.")

    # 4. Print Chunks
    print("\n[3] Printing Chunks (Preview first 3):")
    print("="*60)
    for i, chunk in enumerate(chunks[:3]):
        print(f"CHUNK {i+1}:")
        print(f"Metadata: {chunk.metadata}")
        print(f"Content Length: {len(chunk.page_content)}")
        print("-" * 20)
        print(chunk.page_content[:500] + "..." if len(chunk.page_content) > 500 else chunk.page_content)
        print("="*60)

    if len(chunks) > 3:
        print(f"... and {len(chunks) - 3} more chunks.")

if __name__ == "__main__":
    main()

Chroma DB: The Semantic Search

What is ChromaDB?

ChromaDB is a vector database used to store and search text by meaning, not by exact words, which helps AI systems quickly find the most relevant pieces of information when answering questions.

In simple terms:
ChromaDB saves your document chunks as embeddings (numbers) and, when you ask a question, it finds the most similar chunks so the AI can use them to give an accurate answer.

Terminal
Ready... Click "Run" code above.
chroma_demo.py
import os
import sys
from dotenv import load_dotenv
from langchain_chroma import Chroma
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_core.documents import Document

# Load env for proper setup if needed, though local embeddings might not need API keys
load_dotenv()

def main():
    print("--- ChromaDB Educational Demo: Embedding & Search ---")
    print("(Note: This runs in-memory and does NOT save to disk)")

    # 1. Setup Embedding Model
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    print(f"\n[1] Initializing Embedding Model ({model_name})...")
    embeddings = SentenceTransformerEmbeddings(model_name=model_name)

    # Initialize ephemeral Chroma (no persist_directory)
    vectorstore = Chroma(
        collection_name="demo_collection",
        embedding_function=embeddings,
        # persist_directory=None # Passing None typically implies in-memory/ephemeral in newer versions or default
    )

    # 2. Get Data Input
    print("\n[2] Enter texts to add to the database (type 'DONE' on a new line to finish):")
    texts_to_add = []
    while True:
        line = input(f"Text {len(texts_to_add)+1}: ")
        if line.strip().upper() == 'DONE':
            break
        if line.strip():
            texts_to_add.append(line)

    if not texts_to_add:
        print("No text provided. Exiting.")
        return

    # 3. Embed and Add to Chroma
    print(f"\n[3] Embedding and adding {len(texts_to_add)} texts to Vector Store...")
    # Wrap in Document objects as typical RAG would
    docs = [Document(page_content=t, metadata={"id": i}) for i, t in enumerate(texts_to_add)]

    vectorstore.add_documents(docs)
    print(" -> Done.")

    # 4. Search
    while True:
        query = input("\n[4] Enter a query to search (or 'quit' to exit): ").strip()
        if query.lower() in ['quit', 'exit']:
            break

        print(f" Searching for: '{query}'")

        # Perform similarity search
        results = vectorstore.similarity_search_with_score(query, k=2)

        print(f" -> Found {len(results)} matches:\n")
        for doc, score in results:
            print(f" * Score: {score:.4f}")
            print(f" * Content: {doc.page_content}")
            print(" " + "-"*30)

if __name__ == "__main__":
    main()

FastAPI: The Data Doorway

What is it?

FastAPI is a tool for building websites and APIs. It creates the "Doors" (Endpoints) that allow users to send files or messages to our Python code.

How Code uses it?

It defines the /upload door (for PDFs) and the /query door (for questions). It also handles errors.

main.py (FastAPI)
from fastapi import FastAPI, UploadFile

app = FastAPI()

# Endpoint 1: File Upload
@app.post("/upload")
async def upload_file(file: UploadFile):
    return {"filename": file.filename, "status": "Ingested"}

# Endpoint 2: Query
@app.post("/query")
async def query_endpoint(payload: QueryRequest):
    answer = rag_system.ask(payload.question)
    return {"answer": answer}

Groq API: The Speed Engine

What is it?

Groq is the Engine. It runs the smart AI model (Llama-3). It is famous for being incredibly fast.

How Code uses it?

We use it to Generate the final answer by sending the user's question + the best chunks from ChromaDB.

groq_demo.py
import os
import sys
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate

load_dotenv()

def main():
    print("--- Groq API Educational Demo: LLM Interaction ---")

    api_key = os.getenv("GROQ_API_KEY")
    if not api_key:
        print("Error: GROQ_API_KEY not found in .env file.")
        return

    model_name = os.getenv("GROQ_MODEL", "llama-3.3-70b-versatile")

    # 1. Initialize ChatGroq
    print(f"\n[1] Initializing Groq Client (Model: {model_name})...")
    llm = ChatGroq(
        api_key=api_key,
        model=model_name,
        temperature=0.7
    )

    # 2. Interactive Loop
    print("\n[2] Start Chatting (type 'quit' to exit)")

    # We can use a simple prompt template
    system_prompt = "You are a helpful assistant explaining concepts clearly."

    while True:
        user_input = input("\nYou: ").strip()
        if user_input.lower() in ['quit', 'exit']:
            break

        if not user_input:
            continue

        print("Groq: Generating response...", end="\r")

        try:
            # invocating the model
            messages = [
                ("system", system_prompt),
                ("human", user_input),
            ]
            response = llm.invoke(messages)

            # Clear the loading line
            print(" " * 30, end="\r")
            print(f"Groq: {response.content}")

        except Exception as e:
            print(f"\nError interacting with Groq: {e}")

if __name__ == "__main__":
    main()

Limitations of RAG

Even though RAG is powerful, it is not perfect.

Garbage In, Garbage Out

If your uploaded PDF is blurry, confusing, or wrong, the AI's answer will also be wrong. It cannot fix bad data.

Latency

Because it has to Search Database -> Refine -> Generate, it is slower than just asking standard ChatGPT.

Context Limits

We can only feed a few chunks (Top 3) to the AI. If the answer requires reading the entire book at once, RAG might miss it.

System Architecture

User
Frontend Form
Input: Query + Data (PDF/Txt/Md)
FastAPI Gateway
Route: /upload & /query
Phase 1: Ingest (Data)
LangChain
Chunking & Splitting
Chroma DB
Save Vectors
Phase 2: Query (Text)
Groq API
Enhance/Refine Query
Chroma DB
Compare & Extract Top Chunks
Groq API
Generate Answer (Enhanced Q + Chunks)
Final Answer

Thank You!

Any Questions?

RAG Architecture v1.1.0

🚀 RAG Chatbot Setup Guide

Follow these simple steps to get your own AI Chatbot running with custom document upload support.

1. Project Setup

First, create a folder for your project (e.g., MyChatbot) and verify you have the necessary files.

  1. Create a new folder on your computer.
  2. Download/Move the main.py file into this folder.

2. Create Virtual Environment

This isolates your libraries from other projects to prevent conflicts.

Windows:

python -m venv venv venv\Scripts\activate

Mac/Linux:

python3 -m venv venv source venv/bin/activate
Tip: You should see (venv) at the start of your terminal prompt line now.

3. Install Dependencies

You need to install the required libraries. Create a file named requirements.txt with the following content, or just copy-paste the install command below if you want to skip creating the file.

Dependencies (requirements.txt):
langchain>=0.2.0 langchain-community>=0.2.0 langchain-chroma>=0.1.0 chromadb>=0.5.0 pypdf>=4.0.0 loguru>=0.7.0 langchain-huggingface>=0.0.3 openai>=1.0.0 tiktoken>=0.7.0 sentence-transformers>=2.2.2 numpy>=1.24.0 scipy>=1.11.0 fastapi>=0.110.0 uvicorn[standard]>=0.29.0 python-multipart>=0.0.9 python-dotenv>=1.0.0 pydantic>=2.7.0 groq>=0.9.0 langchain-groq>=0.1.0
Install Command:
pip install -r requirements.txt

4. Get API Keys & Configure

This chatbot relies on Groq for the AI intelligence.

1. Get Groq API Key

Go to console.groq.com/keys, create a free account, and generate a new API Key. Copy it immediately.

2. Create .env File

This file keeps your secrets safe. Create a new file simply named .env (no extension) in your project folder and paste the following content:

GROQ_API_KEY=gsk_your_actual_key_here_xxxxxxxxxxxx GROQ_MODEL=llama-3.3-70b-versatile CHROMA_DIR=./chroma_db CHROMA_COLLECTION=rag_collection EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

Replace gsk_your_actual_key_here... with the key you copied from Groq.

5. Run the Application

Everything is set! Now run the following command in your terminal:

uvicorn main:app --reload

Wait for the logs to say Application startup complete.

6. How to Use

Once running, open your browser and visit:

http://127.0.0.1:8000/docs