PAGSUN
Build With UsGet AI PlaybookLearn From Us
PAGSun Logo
PAGSUN

Building AI Employees to work alongside you and your teams.

Connect With Us

LinkedInLinkedInWhatsAppWhatsApp

© 2026 PAGSUN. All rights reserved.

Precision instruments and tools representing the customisation techniques that transform generic AI into firm-specific intelligence
AI Decoded #4 of 8

Your $200K AI Knows Everything — Except Your Business

The difference between a generic AI and one that works like a specialist inside your firm is the techniques you apply. RAG, fine-tuning, guardrails — explained without the jargon.

Sundar Rajan
Feb 18, 2026
8 min read

Part 4 of 8 — AI Decoded for Founders | Layer 3: The Techniques


Your firm's AI is connected. It's ready. You send it your first real task.

"Draft a competitive landscape for our client in professional services technology. Same format we use for sector analyses."

It comes back with something impressive. Thorough. Structured. Well-written.

Also completely generic.

It doesn't know your firm has spent five years advising in this exact sector. It doesn't know your client's specific priorities. It doesn't know that your "sector analysis format" starts with a market sizing methodology your firm developed in-house.

This is not an AI problem. It's a gap problem.

The model you chose in Part 3 is a brilliant generalist. It knows a great deal about the world and almost nothing about your firm. The techniques in this layer are how you close that gap — turning a general-purpose AI into one that works as if it were trained by your firm, for your firm.

This is the most practically dense layer in the series. Eleven terms. But by the end, you'll know exactly which ones your team should already be using — and which ones to question if they're not.


Prompt Engineering

The first technique. No technical skill required.

Prompt engineering is the practice of crafting precise instructions that get better, more reliable outputs from an AI. The model doesn't change. Only what you tell it to do — and how. A well-written prompt can dramatically improve results without any technical work at all.

Your AI came back with a generic competitive landscape because it received a generic brief. Change the brief: "Analyse the competitive landscape for professional services technology in the enterprise segment. Focus on pricing models, implementation complexity, and switching costs. Structure as: 1) market overview with sizing, 2) three to five key players with differentiation analysis, 3) strategic implications for an incumbent advisory firm. Use a direct, evidence-led tone. Cite sources for every factual claim." The output changes entirely.

Think of writing a brief for a strong analyst. A lazy brief produces a lazy output. A precise brief — what to find, in what structure, at what level of depth — produces work you can actually use. Prompt engineering is the discipline of writing good briefs, applied to AI. It costs nothing. It starts on day one. It is the highest-return skill anyone on your team can develop right now.


Context Window

How much the AI can hold in its working memory at once.

The context window is the amount of text a model can read and consider in a single interaction. Measured in tokens. A large context window means you can hand the AI a longer document, more background, and more detailed instructions before it responds. Current frontier models can hold the equivalent of a full book in their context at once.

Your firm sends an AI the full 180-page market research report alongside a client brief and your analytical framework. Whether the model can read all of that in one pass — or must work in chunks and potentially lose the thread between page 12 and page 140 — depends entirely on its context window. For firms doing complex, document-heavy analysis, this isn't a footnote. It's a capability requirement that belongs in your vendor evaluation.

Think of it as the analyst's desk. They can only actively work with what's spread out in front of them. A small desk means working through the material in sections — potentially missing connections between distant parts. A large desk means the entire report, the brief, and your framework are all visible at once. Context window is desk size.


Token

The billing and capacity unit — understand this before you scale.

Tokens are the unit AI models use to measure text — roughly 0.75 words, or about four characters. Every input you send and every output you receive is counted in tokens. It matters for two reasons: cost, because AI APIs charge per token; and capacity, because context windows are measured in tokens too.

Your firm's AI tool processes research documents and client briefs every day. A 50-page industry report is roughly 25,000 tokens. If you're running 20 such documents a week across a team, that's 500,000 tokens before accounting for outputs. Understanding token volume is how you build a cost model before you commit to a rollout — not after you've already scaled it and the bill arrives.

Think of tokens as the billing unit for your AI's time — like a word count that determines the invoice. You don't think about it word by word. But you need to understand it well enough to ask: "How many tokens does this workflow consume per week, and what does that cost us at full scale?" That question, asked early, saves significant money later.


RAG — Retrieval-Augmented Generation

The most important technique in business AI right now.

RAG connects an AI model to your own data — your documents, your knowledge base, your past work, your databases — so it retrieves relevant information before responding, rather than relying solely on what it learned during training. Without RAG, the AI works from general knowledge. With RAG, it works from yours.

Your firm has five years of sector research, client deliverables, proprietary frameworks, and annotated case studies. Without RAG, the AI answers your competitive landscape question using publicly available information. With RAG, before it writes a single word, it retrieves your firm's three most relevant prior sector analyses, your standard methodology documents, and the client's background brief — then produces an output calibrated to your firm's knowledge, not the internet's.

Think of the difference between an analyst working from memory and one who has full access to your firm's entire archive before they begin. Same person. Same intelligence. But one is working with your five years of accumulated knowledge, and one is working from scratch. That's the difference RAG makes.


Vector Database

The type of database that makes RAG actually work.

A standard database finds exact matches — search "pricing model" and it returns documents containing those exact words. A vector database finds meaning — search "how competitors charge clients" and it surfaces the pricing model documents, even if neither of those words appears in them. It stores content as mathematical representations of meaning and retrieves by conceptual similarity rather than keyword match.

Your firm's knowledge base holds hundreds of past deliverables and research pieces. When the AI is pulling prior work relevant to a new client question, it needs to find the conceptually relevant material — regardless of whether earlier documents used the same terminology the current client used. A vector database makes that semantic retrieval possible. A standard database would miss half the relevant work because the words don't line up exactly.

Think of a filing system organised by concept rather than by label. Everything about "competitive dynamics in fragmented markets" lives together — regardless of what each document called it. Your analyst can pull the right precedent even when they don't remember the exact title of the original piece. That's what a vector database gives your AI.


Embeddings

The mechanism underneath RAG — your team will use this term constantly.

Embeddings are the mathematical representations of text — or images, or audio — as sets of numbers that capture meaning. When content is converted into embeddings, similar ideas end up as mathematically similar numbers. That's what allows a vector database to find conceptually related content. You rarely interact with embeddings directly, but your technical team will use this term constantly when building any RAG system.

When your AI stores a past client deliverable, it actually stores thousands of numbers representing the meaning of that content. When a new query arrives — even phrased completely differently — its embedding is mathematically close, so the database surfaces the right match. That's embeddings doing the work underneath RAG. You don't need to build them. You need to understand that when your team says "we need to embed the knowledge base first," this is what they mean and why it matters.

Think of it as the conceptual index your analyst carries in their head — the instinctive knowledge that "organisational design," "operating model redesign," and "structure rationalisation" all point to the same type of work. Embeddings encode that conceptual index in mathematics so the AI navigates it the same way.


Fine-Tuning

The advanced technique — most firms don't need it first.

Fine-tuning means taking an existing foundation model and running additional training on your firm's specific data — so it learns to work the way your firm works. It absorbs your methodology, your writing style, your standards. Outputs begin to sound like they came from inside your firm rather than from a general AI given a brief.

Your firm has years of completed client deliverables — reports, analyses, strategic recommendations — all annotated with partner feedback and final edits. Fine-tuning means running the model through all of that, so it learns to produce work that reflects your firm's standards without needing to be briefed from scratch every time. The model develops instincts specific to your firm's way of working. That's the upside. The downside: fine-tuning is expensive, technically demanding, and requires significant high-quality data to do properly.

Start with RAG and prompt engineering. Most firms get 80% of the value from those two techniques alone. Consider fine-tuning only when you have a well-defined, high-volume use case, the historical data to support it, and a technical team with the capability to run and maintain it. Rushing into fine-tuning too early is one of the most common — and costly — mistakes in enterprise AI.


Synthetic Data

Purpose-built training material when real data is off-limits.

Synthetic data is artificially generated data — realistic but not real — used to train or test AI models when actual data is scarce, sensitive, or doesn't cover the edge cases you need. It lets you build and improve models without exposing confidential information.

Your firm's real client deliverables are confidential. Using them directly in a training pipeline raises serious legal and ethical considerations. Synthetic data solves this: your team generates structurally realistic, professionally plausible examples — realistic sector analyses, realistic client briefs, realistic strategic recommendations — that never came from actual client work. The model trains on the shape, style, and substance of your firm's work without ever touching a real client file.

Think of training simulations. A new analyst can't practice on live client engagements. But they can work through realistic case simulations designed to develop the skills they need. Synthetic data is the AI equivalent — purpose-built practice material that builds capability without risk.


Hallucination

The risk you must take seriously — especially in client-facing work.

Hallucination is when an AI model confidently states something that is factually wrong. It cites a study that doesn't exist. It misquotes a statistic. It attributes a claim to a source that never made it. The model doesn't know it's wrong. It generates plausible-sounding output regardless of whether it's accurate — with no disclaimer attached.

Your AI produces a sector analysis. It states that a key competitor "posted 34% revenue growth in 2023" and cites a Gartner report. You include it in the client presentation. The competitor's actual growth was 12%. The Gartner report cited doesn't exist. You find out when the client's team pushes back in the meeting. In consulting, your reputation is your product. Hallucination isn't a curiosity — it's a professional liability.

Think of an analyst who summarises a research paper from memory, gets a key number wrong, and presents it with complete confidence. You can only catch what you check. Hallucination is exactly why every AI output in client-facing work needs a verification layer. It is not optional. It is not the model's fault. It is a known limitation of the technology — and managing it is your firm's responsibility.


Grounding

The technique that makes AI outputs verifiable.

Grounding anchors AI responses to specific, cited sources — so every output is traceable and checkable. RAG is a form of grounding. A grounded AI doesn't just answer. It shows its work.

Your AI doesn't just say "the market is growing at 14% annually." It says: "Market growth is reported at 14.2% CAGR — Source: McKinsey Global Institute, 'The Future of Professional Services,' 2024, p.23." That citation takes a reader ten seconds to verify. Without grounding, every claim the AI makes is an assertion. With grounding, every claim is a reference. In client-facing work, that difference is everything.

Think of the standard your firm already holds its analysts to — every claim needs a source, every statistic needs a citation. Grounding enforces that same standard at the AI level, on every output, automatically. It doesn't eliminate the risk of hallucination. But it makes every error immediately visible — and immediately catchable before it reaches a client.


Guardrails

The hard limits your AI cannot cross — no matter how it's prompted.

Guardrails are the constraints and safety mechanisms built around an AI model that prevent it from producing outputs that are harmful, out-of-scope, or non-compliant. They don't make the model smarter. They define what it is and isn't allowed to do — regardless of how someone instructs it.

Your AI research tool must never present a finding as a final strategic recommendation without human review. It must never use one client's confidential data when producing work for another. It must never fabricate a source or present uncertain information as established fact. Guardrails enforce these limits in code — not as guidelines, not as hoped-for behaviours, but as hard constraints the system cannot violate even accidentally. Without guardrails, one confident AI output stating something it shouldn't is enough to create a serious problem.

Think of the professional and ethical boundaries your analysts operate within. They can research, synthesise, and draft — but they cannot sign off on advice without appropriate review, make representations outside their scope, or share client information inappropriately. Those aren't personal limitations. They're professional obligations. Guardrails enforce the same obligations at the AI level — systematically, on every output, without relying on the model's good judgment.


The Techniques Toolkit at a Glance

TechniqueWhat it doesWhen you need it
Prompt EngineeringBetter instructions → better outputsAlways; from day one
Context WindowHow much the AI reads at onceWhen working with long documents
TokenThe billing and capacity unitWhen estimating cost at scale
RAGConnects AI to your firm's knowledgeWhen AI needs to know your business
Vector DatabaseSemantic search that powers RAGWhen building any RAG system
EmbeddingsMeaning encoded as numbersUnderneath RAG — your team's concern
Fine-TuningDomain-specific retrainingAfter RAG; only with enough data
Synthetic DataRealistic fake training dataWhen real data is sensitive or scarce
HallucinationAI confidently getting things wrongA risk to manage — always
GroundingEvery output tied to a cited sourceAny client-facing deployment
GuardrailsHard limits the AI cannot crossAny professional or regulated context

What this layer means for you as a strategic leader: You don't need to build any of these yourself. You need to know which ones your team is using — and ask about the ones that aren't in place. RAG and prompt engineering are where every serious AI deployment starts. Grounding and guardrails are not technical decisions. They are professional risk decisions. If your firm's AI deployment doesn't have them, that conversation belongs on your agenda — not your tech team's.


What's Next

You now have the model. You have the techniques to make it specific to your firm.

Part 5 is where it starts operating inside your business.

And that brings the most consequential question in this series: does your AI work alongside your team — suggesting, drafting, supporting — while people stay in control? Or does it work autonomously — taking a task from start to finish, without a human in the middle?

That one question, answered deliberately for each workflow in your firm, is your AI strategy. Part 5 gives you the framework to answer it.


We Build AI Employees to Work Alongside Your Team

Want to Have a Strategic Discussion?

Book A Discovery Call

Don't miss out on an additional 5x to 10x revenue growth and stay ahead of competitors

Precision circuit board representing the model layer — the AI engine your business strategy runs on

This Isnt a Tech Decision — Its a Hiring Decision

LLMs, foundation models, open source, proprietary — these are the engines your strategy runs on. Heres how to choose the right one without a technical background.

Feb 18
•
6 min read
Previous Article
Team collaborating around a workflow representing AI agents, copilots, and multi-agent operating patterns

This One Decision Shapes Everything Else — And Most Get It Wrong

Copilot or agent? This single choice determines whether AI makes you faster or whether it changes what you can do entirely.

Feb 18
•
7 min read
Next Article
Discuss with Author