Teaching Rocks to Read: Building a RAG Pipeline that Actually Works

Everyone wants an AI chatbot these days. "Make it like ChatGPT, but for my plumbing business," they say.

The problem? ChatGPT lies. Confidently.

If you ask a raw GPT-4 model about your specific return policy, it will hallucinate a beautiful, customer-friendly policy that will absolutely bankrupt you.

Enter RAG (Retrieval Augmented Generation).

What is RAG? (Explain Like I'm 5)

Imagine taking a test.

Standard LLM: You have to take the test from memory. You might make stuff up.
RAG: It's an open-book test. You look up the answer in the textbook first, then write it down.

The Stack

For Quick Chat, I didn't want to use a separate Vector DB like Pinecone (expense) or Weaviate (complexity). I stuck to what I know: PostgreSQL.

Did you know Postgres has a plugin called pgvector? It turns your boring old relational database into a semantic search powerhouse.

-- It literally just calculates the distance between numbers.
-- But we call it "Cosine Similarity" to charge more per hour.
SELECT * FROM documents
ORDER BY embedding <=> '[0.123, 0.456, ...]'
LIMIT 5;

The "Context Window" Problem

The hardest part wasn't the AI; it was the chunking.

You can't just feed an entire 50-page PDF into the prompts.

It's expensive.
The AI gets confused (like me reading legal docs).

I wrote a recursive text splitter that tries to keep paragraphs together. It worked great until I tested it with a client's "unstructured" data, which turned out to be a CSV file saved as a PDF screenshot. Why do users do this?

Hallucination Guardrails

To stop the bot from going rogue, I added a system prompt that essentially gaslights the AI:

"You are a helpful assistant. If you do not see the answer in the context provided below, admit defeat. Do not invent facts. Do not pretend to know things. Your existence depends on accuracy."

It works 99% of the time. The other 1%? It tries to sell users a subscription to services we don't offer.

The Takeaway

Building Quick Chat taught me that data quality > model quality. GPT-3.5 with great context beats GPT-4 with bad context every time.

Also, never trust a CSV file sent by a marketing department. Ever.

Teaching Rocks to Read: Building a RAG Pipeline that Actually Works

What is RAG? (Explain Like I'm 5)

The Stack

The "Context Window" Problem

Hallucination Guardrails

The Takeaway

More from the Blog

The $2 Billion Pizza Delivery: When Jensen Huang Met OpenAI

OpenAI: From "Benefit Humanity" to "Benefit Sam Altman Bank Account"

The Godfather of AI Quit Google to Tell Us We're Doomed (Thanks, Geoff)

Ready to Build Something Extraordinary?

Teaching Rocks to Read: Building a RAG Pipeline that Actually Works

What is RAG? (Explain Like I'm 5)

The Stack

The "Context Window" Problem

Hallucination Guardrails

The Takeaway

More from the Blog

The $2 Billion Pizza Delivery: When Jensen Huang Met OpenAI

OpenAI: From "Benefit Humanity" to "Benefit Sam Altman Bank Account"

The Godfather of AI Quit Google to Tell Us We're Doomed (Thanks, Geoff)

Tech Stack

Ready to Build Something Extraordinary?