RAG, agents, and the AI stack: making sense of it for non-ML people
The AI vocabulary has expanded rapidly and it's easy to feel lost. Here's a practical guide to the concepts that actually matter for IT professionals.
If you spend any time following AI news, you'll have noticed the vocabulary getting denser. RAG, embeddings, vector databases, agents, function calling, fine-tuning. The terminology is proliferating faster than the explanations.
This post is my attempt to make sense of the concepts that I think actually matter for IT professionals, without requiring a machine learning background.
RAG: the concept that changes the limitations
RAG stands for Retrieval-Augmented Generation. It's the approach that solves one of the biggest practical problems with large language models: they only know what was in their training data.
In simplified form:
-
You have a base LLM (like GPT-4 or Claude 3) that's very good at reasoning and language but doesn't know about your specific documents, your company's policies, or anything that happened after its training cutoff.
-
You build a system that, when you ask a question, first retrieves relevant documents from your own knowledge base, then includes those documents as context when sending your question to the LLM.
-
The LLM answers the question using both its general capability and the specific retrieved content.
The result: the model can answer questions grounded in your actual data, with citations, without requiring expensive model fine-tuning.
RAG is why enterprise AI products like Microsoft 365 Copilot are interesting. Copilot uses the Microsoft 365 Graph (your actual emails, documents, meetings) as the retrieval layer. The model isn't GPT-4 with your documents fine-tuned in; it's GPT-4 with relevant documents pulled in at query time.
Agents: giving AI tools to use
An "agent" in the AI context means an LLM that can take actions, not just produce text. You give the model access to tools (a web search function, a code interpreter, an API call) and it can decide which tools to use and in what sequence to accomplish a task.
The demos are compelling: tell an AI agent to "research competitor pricing, summarise the findings, and draft a response to this customer email." It searches the web, reads pages, synthesises information, drafts text, all in sequence, with the human reviewing the output rather than doing each step.
In practice, current agents are impressive but unreliable for complex, multi-step tasks. They get things done when the task is well-defined and the tools are reliable. They fail unpredictably when something unexpected happens mid-chain. The reliability is improving. This time last year, agents were more of a research concept; now they're genuinely useful for constrained tasks.
Embeddings and vector databases: the infrastructure behind RAG
This is where it gets technical, but the concept is straightforward: embeddings are a way of representing text as numbers (vectors) that capture semantic meaning. Similar texts produce similar vectors.
Vector databases store these embeddings and allow fast "similarity search": given this piece of text, what other texts in our database are semantically similar?
This is the infrastructure that makes RAG work. When you ask a question, your question gets converted to a vector, the database finds the most similar document chunks, and those get included in your LLM prompt.
You don't need to build this yourself. Azure AI Search, Amazon Bedrock, and various SaaS products abstract all of this. But understanding what's happening is useful when evaluating these products.
What this means for IT professionals
The important practical implications:
For Microsoft shops: M365 Copilot is RAG-based. Its quality depends on the quality and organisation of your Microsoft 365 data. Clean data governance is suddenly a product quality problem, not just a hygiene problem.
For security: Agents that have access to real tools in your environment create new attack surfaces. Prompt injection, where malicious content in a retrieved document tries to redirect the AI's actions, is a real threat vector worth understanding.
For architects: The "AI stack" is now a thing to think about. LLM choice, retrieval infrastructure, agent frameworks, evaluation tooling: these are architectural decisions with real implications.
The concepts themselves aren't that complicated once you strip away the ML vocabulary. The hard part is deciding what to actually build with them.