Ai-Infrastructure on

Your AI Isn't Going Off the Rails. It Never Had Any.

Sun, 31 May 2026 00:00:00 +0000

Your AI Isn’t Going Off the Rails. It Never Had Any.

The most common complaint I hear from people using AI is some version of the same sentence: “It goes off the rails.” It drifts. It forgets what I told it. It invents things. It has no direction.

I understand the frustration, but the phrasing hides the actual problem. Going off the rails implies there were rails to begin with. There weren’t. A model in a blank chat window has no memory of who you are, no rules about how it should behave, and no defined process for doing the work. It is not drifting away from a plan. There was never a plan for it to drift from.

So when people tell me their AI lacks direction, what they are really describing is a missing system. The fix is almost never a better prompt. It is more structure. And the amount of structure you give the AI is the single biggest variable in whether it behaves like a reliable partner or a clever stranger who resets every morning.

The clearest way I have found to explain this is as three tiers. Each one solves a bigger slice of the “no direction” problem than the last.

Tier 1: Claude Chat — The Conversation

This is where almost everyone starts, and where most people stay. You open a chat window, you type, it responds. Each conversation is mostly a blank slate.

The defining trait of this tier is amnesia. A new chat forgets everything. Whatever context you want the model to have, you provide manually, in the prompt, every single time. The direction comes entirely from you. The model cannot touch your files, run anything, or reach your systems. It talks, and that is all it does.

This is genuinely useful. For a quick question, a brainstorm, a first draft, or thinking out loud, a chat window is fast and frictionless. But it is also exactly why it feels directionless on anything bigger. Nothing constrains it. There are no rules, no persistent goal beyond your last message. If your prompt is vague, the output is vague. It is a brilliant intern with total amnesia, and you are re-explaining the entire job every morning.

People at this tier blame the model. The model is rarely the issue. The issue is that nothing is holding it on a track, because there is no track.

Tier 2: Claude Code — The Operator

The second tier is a different kind of tool entirely. Claude Code is an agent that lives in your terminal. It does not just talk about work — it does the work. It reads and writes real files, runs commands, searches the web, and operates on your actual environment instead of an imagined one.

Two things change the moment you move here.

First, it gets a working memory of your project. A CLAUDE.md file holds persistent instructions for that codebase or project, so the model arrives already knowing the conventions, the goals, and the rules you have written down. You stop re-explaining the project on every session.

Second, and more importantly, it works in a loop: act, observe the result, correct. It writes a file and sees whether the change worked. It runs a command and reads the actual output. That feedback loop is what kills the drift. The model is not imagining what might happen — it is looking at what did happen and adjusting. Operating on real artifacts instead of guesses is most of the discipline.

The honest limitation: this discipline is per-project and largely manual. You set up each project’s instructions yourself. The memory does not follow you from one project to the next, and there is no consistent persona or process spanning everything you do. It is a sharp, capable operator — but one you have to brief fresh for every new job.

Tier 3: AI Infrastructure

The third tier is the one that actually fixes “no direction” at the root, because it stops treating each session as a fresh start.

AI Infrastructure is a system that wraps Claude Code and gives it a permanent identity, a rule set, a knowledge base, and a defined process. The jump from Tier 2 to Tier 3 is the difference between hiring a contractor and building an operations department. This post itself is a small example: it was written inside an AI Infrastructure that already knew my blog’s voice, my formatting conventions, and where the file should be saved, without my having to say any of it.

Three things work together here, and they are what make drift structurally hard.

The first is persistent identity and memory. The infrastructure does not forget who I am, what I work on, or how I want things done. When I correct it, the correction sticks across every future session, not just the current chat. The knowledge lives in files I own, not inside a conversation that disappears when I close the tab.

The second is a defined process. Every non-trivial request gets classified and routed through a structured sequence: understand the request, plan the approach, do the work, verify it against explicit criteria. The model cannot freewheel, because a process governs the response before it starts. That is the literal opposite of going off the rails — the rails are built in.

The third is context routing. Instead of me pasting the right background into every prompt, the system pulls the relevant knowledge automatically based on what I am doing. The model arrives oriented, every time.

None of this makes the underlying model smarter. It makes the environment around the model disciplined. That is the whole trick.

The Same Symptom, Mapped to the Fix

When someone describes their AI as directionless, the specific complaint usually tells you exactly which tier they are stuck on and what would move them up.

If it forgets what you told it, you are in a chat window and you need persistent instructions — that is the move to Claude Code.

If it hallucinates instead of using your real data, you need to let it read your actual files — again, the move to an operator that touches your environment.

If you keep re-explaining your preferences across projects, you have outgrown per-project memory and need persistent identity — the move to infrastructure.

And if it has no consistent process from one task to the next, you need a defined algorithm governing how every request gets handled — the same move.

Notice the pattern. Every one of these is solved by adding structure, not by writing a cleverer sentence into the prompt box.

The Real Shift

Direction is not something you nag the AI for inside each prompt. It is something you build into the system once.

That reframing is the entire jump from chatting to infrastructure. A chat window is equally capable on day one and day three hundred, because nothing accumulates. An operator gets more useful per project, as long as you keep briefing it. An infrastructure compounds — every rule you add, every preference it learns, every process you refine makes the next session start further ahead than the last.

If your AI feels like it has no direction, it is not malfunctioning. It is doing exactly what an unstructured system does. The rails were never the model’s job to build. They are yours.

If a Script Can Do It, Don't Ask the LLM

Mon, 20 Apr 2026 00:00:00 +0000

If a Script Can Do It, Don’t Ask the LLM

When your AI system is spending inference tokens to parse date fields, strip PDF formatting artifacts, or convert file syntax — that is not an AI problem. That is an architecture problem. Code can handle all of those tasks deterministically, faster, cheaper, and more accurately than any language model. If your token bill is high and your outputs are inconsistent, the problem is usually not the model. It is what you are handing the model.

PAI treats every token as a resource allocation decision. If a script can do it, a script does it. The model gets the part that actually requires reasoning.

The Structural Tax

Most AI workflows are built around convenience. You have a PDF — you upload it. You have a session log — you paste it in. You have a directory of files — you point the model at the folder. The model figures it out.

That approach works well enough for one-off tasks. It breaks down at scale, in recurring workflows, and anywhere precision matters.

Raw PDFs carry structural overhead that has nothing to do with the content you care about: page headers, footers, column artifacts, encoding noise, redundant whitespace, and metadata that made sense in the original layout but becomes garbage in a text stream. A language model consuming that document is spending a meaningful portion of its context window on formatting debris. That cost compounds across every document in every workflow.

The same pattern shows up everywhere. Raw session transcripts contain tool call metadata, timestamps, and infrastructure details that are irrelevant to extracting what was learned. Full database contents are too large to fit in context, so the model guesses at relevance instead of reasoning over curated results. Obsidian-flavored Markdown with custom image syntax breaks rendering in any system that was not built for it.

Every one of these is a structural problem. None of them require intelligence to solve. They require code.

What PAI Does Instead

PAI operates on a two-phase model: scripts handle structure, the LLM handles reasoning. The handoff only happens after the mechanical work is done.

A few examples of how this plays out in practice.

PDF to Markdown conversion. When I ingest Oracle HCM documentation — dense, multi-column PDFs that can run hundreds of pages — nothing goes to the LLM in PDF form. A Python script converts the document to clean Markdown first, stripping formatting artifacts and preserving the content hierarchy. The model sees structured text, not a scanned layout. The difference in token consumption is substantial, and the difference in output quality is larger still.

Session harvesting. Claude Code sessions produce raw JSON transcripts containing every tool call, every response, and a lot of infrastructure noise. SessionHarvester.ts parses those transcripts before any analysis happens. It extracts structured learning signals — what worked, what failed, what decisions were made — and writes them to a clean format. The LLM receives a curated summary, not a raw dump.

Activity parsing. ActivityParser.ts does the same for daily activity logs: reads session files, extracts structured change records, and produces a clean representation of what changed in the PAI system. No model inference required until there is something worth reasoning about.

Knowledge base search. The cybersecurity knowledge base holds over fifty books worth of content in a local PostgreSQL vector store. When a query comes in, CyberSecKB.ts runs a similarity search and returns the most relevant passages. The LLM reasons over that curated result set — not the entire library. Retrieval is deterministic. Reasoning is probabilistic. They are separate steps by design.

Image pipeline. Obsidian uses ![Image Description](/images/filename.png) syntax for embedded images. Hugo uses standard Markdown. images.py handles the conversion and copies files to the correct static directory at publish time. No LLM is involved at any point in that pipeline. It is a string transformation and a file copy — exactly the kind of task code should own.

The Boundary Is the Design

The two-phase pattern is not a workaround. It is the architecture.

Every AI workflow has a boundary between deterministic computation and probabilistic reasoning. The question is where you draw it. Drawing it too early means the model is doing structural work it is not suited for: burning tokens on format conversion, wasting context on irrelevant metadata, producing inconsistent output because the input was inconsistent. Drawing it correctly means the model receives exactly what it needs to do the work only it can do.

This is a different problem than the one prompt engineering was designed to solve. Prompt engineering is about communicating intent clearly to a model. The code-first pattern is about not asking the model to earn the right to see your data by first organizing it. That work should already be done.

The scripts in PAI are not wrappers around LLM calls. They are the preparation layer — the part of the system that exists specifically so the model does not have to deal with structure. Once that layer is in place, the model’s job gets narrower and cleaner. It reasons. It synthesizes. It makes judgment calls. It does not convert file formats.

The Result

When the pipeline is built correctly, token consumption goes down and output quality goes up — not because the model improved, but because the input did. The model is working with clean data in a consistent format, scoped to exactly what is relevant. The structural variables are removed before inference begins.

That is the return on building the prep layer. Less waste. More signal. A model that can focus on the part of the problem that actually requires a model.

Stop Prompt Engineering. Start Building Infrastructure.

Sun, 19 Apr 2026 00:00:00 +0000

Stop Prompt Engineering. Start Building Infrastructure.

Last week I opened a terminal, typed six words, and watched PAI spend the next three minutes processing a set of handwritten study notes exported from my reMarkable tablet. It converted the file format, extracted key concepts, generated structured review questions, cross-referenced my existing knowledge base, and saved everything to the correct directories in Obsidian — organized by module, tagged correctly, ready to use. I did not write a prompt. I did not explain what certification I was studying. I did not describe the output structure I wanted. I just named the module.

Eighteen months ago, that same task would have started with a paragraph explaining what the reMarkable export format was, what the certification covered, how I organized notes in Obsidian, what level of detail I wanted in the summary, and what format the quiz questions should follow — and another paragraph if I wanted the output saved to a specific location. Every single session. From scratch.

That gap is the entire argument for building an AI harness instead of staying in a chat window.

The Chat Window Tax

Prompt engineering emerged as a discipline because LLMs are stateless by default. Every conversation starts with a blank model. If you want the model to know who you are, what you work on, how you like your outputs formatted, and which approach you prefer for recurring problems — you have to tell it. Every time.

That is a tax. Not a feature. A tax.

The people who got good at prompt engineering got skilled at paying that tax efficiently — writing shorter context dumps, using system prompts in API playgrounds, building prompt libraries they paste from. It helped. But it never made the tax go away. It just made each payment slightly cheaper.

In 2026, paying that tax is a choice. The tools exist to stop paying it entirely.

What a Harness Actually Does

A harness is infrastructure wrapped around your AI runtime. In my case, that is PAI — Personal AI Infrastructure — running on top of Claude Code in the terminal. The architecture has three layers.

Memory is persistent context that survives across sessions. PAI knows my role (HRIS analyst), my platform (Oracle HCM Cloud), my Oracle triage methodology, my blog’s writing conventions, my active projects, and my preferences for output formats. None of that gets re-entered. It gets loaded automatically at session start.

Skills are pre-built, parameterized workflows. When I say “process my study notes,” a skill handles that — reading from the right directory, converting the format, saving to the right Obsidian path, cross-referencing the knowledge base. The skill is the prompt, written once, tested, improved over time. I do not craft it fresh every time.

The Algorithm is a structured execution framework. When the work is complex — multi-step, multi-file, non-trivial — PAI runs through a defined process: observe, think, plan, build, execute, verify, learn. The output is consistent because the process is consistent.

Taken together, these three things mean the model is never starting from zero. It arrives at each session already oriented.

The Token Economy Hidden Inside the Infrastructure

There is a practical angle to this that does not get talked about enough: token consumption.

Every message in a chat session burns tokens — your context, the model’s reasoning, the output, and whatever you paste in to re-establish state. The longer and more complex the session, the faster you burn toward usage limits. When you are re-explaining your role, your project, and your preferences at the start of each conversation, you are spending tokens on re-orientation, not on actual work.

A harness changes the math.

PAI loads persistent context at session start through hooks — but those are structured files read by the runtime, not large prompt blocks the model has to reason through. The model arrives oriented. The working token budget goes toward the task.

More importantly, PAI externalizes logic that would otherwise live inside the conversation. The skills are pre-written workflows. The Algorithm is a structured execution framework. The session hooks handle routing and context injection. A significant portion of what would normally require the model to think its way through — “what directory does this go in?”, “what format does this certification use?”, “what’s the right next step in this process?” — is already answered in scripts and configuration files that run before the model responds.

That is not just more efficient. It changes your usage ceiling. When the model is not spending context budget on re-orientation or derivable decisions, more of each session goes toward meaningful work. You hit limits later, do more per session, and run longer chains of complex tasks without interruption.

Prompt engineering optimizes the prompt. Infrastructure optimizes the budget.

CLI vs. Chat: It Is Architecture, Not Preference

This is the part that took me a while to articulate. The preference for CLI over chat window is not aesthetic — it is structural.

A chat window is a conversation interface. Conversations are ephemeral. They have no persistent state, no programmable hooks, no way to inject context at session start, no way to trigger workflows, no way to store outputs in structured memory. The UX is polished. The architecture is a dead end for anything requiring continuity.

A CLI is a programmable runtime. Session start hooks can load context files. Commands can trigger skills. Outputs can write back to memory. Different agents can be spawned with different contexts and run in parallel. The AI operates inside an environment you built, not inside a box you are renting.

That difference compounds. A chat window is equally capable on day one and day three hundred. A harness gets more capable every time you add a skill, improve the memory, or refine the algorithm.

Before and After: The Same Problem, Two Environments

Chat window, eight months ago:

“I have study notes from a certification I’m working through, exported as a Word document from my tablet. I organize my notes in Obsidian under a folder structure by certification and module number. I need you to convert the content to clean markdown, extract the key concepts as a structured summary, generate quiz questions with answers, and format everything to match my existing note structure. The certification is [name], this is module [N], and here’s an example of how my other notes look: [paste example]…”

Then the session ended. Next time I had notes to process — same context dump, from scratch.

With PAI, today:

“Process my study notes for module 4.”

PAI already knows the certification, the Obsidian directory structure, the naming conventions, the quiz format, and which knowledge base to cross-reference. Processing starts immediately. The notes land in the right place in the right format.

The eight-month gap between those two experiences is not better prompting. It is infrastructure.

2026: Where the Power Users Went

The practitioners who were deep into prompt engineering two years ago have largely moved on — not to better prompts, but to better systems. They are building skills, writing memory schemas, wiring session hooks, running structured execution algorithms on complex work. The prompt engineer persona is being quietly replaced by the AI infrastructure builder.

This is not about being technical. It is about thinking one level up. Instead of asking how to get a better response to this prompt, you ask what a system would need to know to handle this reliably, every time.

Your Knowledge Doesn’t Live in the Model

One of the less obvious benefits of building infrastructure rather than relying on chat conversations: your knowledge is not locked to any LLM.

When everything lives in a chat window, switching models means starting over. Your context, your conversation history, your accumulated session knowledge — gone. The model you were using knew who you were because you kept telling it. A different model knows nothing.

With PAI, the knowledge lives in files you own. The memory is markdown on your machine. The skills are scripts in a directory. The algorithm is a structured process your runtime executes. None of it is stored inside Claude, or any other model. The AI is the engine, not the warehouse.

That distinction matters more than it sounds. LLMs are evolving fast. A model that is the best choice today may not be the best choice in six months. If your entire working context is entangled with one provider’s chat history, migration is painful. If your context lives in a portable, file-based system, switching the underlying model is a configuration change — not a rebuild.

I run PAI on Claude today because it is the best fit for how I work right now. But the memory schema, the skill library, the algorithm — all of it would transfer to a different model without losing a session’s worth of context. That portability is a deliberate design choice, and it is one of the most underappreciated properties of building on open infrastructure rather than inside a walled chat product.

Credit Where It’s Due

PAI did not emerge from a vacuum. A significant part of the thinking behind it — the idea that AI should be augmenting structured, intentional human systems rather than replacing ad-hoc conversations — traces directly to the work of Daniel Miessler .

Daniel has been articulating the case for AI infrastructure thinking longer than most. His Fabric project, his writing on augmented intelligence, and his broader framing of what it means to build systems that extend human capability rather than just answer questions — all of it shaped how PAI was conceived and how it continues to evolve.

The shift from “better prompts” to “better systems” is not a new idea. It just needed enough tooling to become practical. Daniel saw that early.

Where to Start

PAI is open-source. Claude Code is free to start — it is Anthropic’s official CLI, available to any Claude user. The distance between using AI in a chat window and running it inside a harness is smaller than it looks, and the compounding return starts from the first session where PAI remembers something you did not have to re-enter.

If you are still re-explaining yourself every time you open a new tab, that is the problem worth solving.

Claude Code Has Skills. PAI Has a Skill System. Here's the Difference.

Sun, 15 Mar 2026 00:00:00 +0000

Claude Code Has Skills. PAI Has a Skill System. Here’s the Difference.

There’s a word that shows up in both Claude Code’s documentation and in PAI’s architecture: skills. And because they share the same word — and even the same file conventions — it’s easy to assume they’re roughly equivalent. One is just a slightly fancier version of the other.

They’re not. The relationship is closer to HTTP and a web framework. Claude Code’s skill mechanism is the protocol. PAI is the framework built on top of it.

Understanding that distinction changed how I think about what I’ve actually built on my machine.

Start Here: What Claude Code’s Skill Mechanism Actually Is

Before explaining what PAI adds, it’s worth being precise about what Claude Code provides natively — because it’s both more minimal and more elegant than most people realize.

Claude Code’s skill system works like this:

At startup, Claude reads every SKILL.md file it finds under ~/.claude/skills/
The description field in each skill’s YAML frontmatter determines when that skill activates — it’s pure intent matching. Anthropic caps this at 1024 characters.
When a skill matches your request, the Skill tool injects the full SKILL.md content into Claude’s context window
Claude follows the instructions in that file

That’s the entire mechanism. It’s a context injection system with a routing layer.

The USE WHEN clause in a skill description is the key piece. Here’s a simplified example from my OracleHCM skill:

---
name: OracleHCM
description: Expert Oracle HCM Cloud troubleshooting. USE WHEN user mentions
 Oracle HCM, HCM Cloud, HDL, HCM Data Loader, Journey, Checklist, workflow
 approvals, autocomplete rules, fast formulas, security profiles...
---

When I describe an Oracle HCM problem in natural language, Claude Code matches my intent against that description and loads the skill. I never have to say “use the Oracle HCM skill.” The intent matching handles it.

Elegant. Minimal. And — on its own — surprisingly limited.

The Gap Between “Context Injection” and “Operational Capability”

Imagine a skill that’s just a long markdown file. When it loads, Claude reads the instructions and tries to follow them. If the instructions are clear and the task is simple, that works fine. But for anything complex — something that involves multiple steps, personalized behavior, CLI tooling, external APIs, or parallel agents — a single markdown file loaded into context starts to break down.

The instructions get long. They can’t be personalized without making the skill personal (and therefore un-shareable). There’s no way to dispatch to a sub-procedure. There’s no tooling layer. There’s no way to say “if the user wants to create a blog post, follow this procedure; if they want to publish, follow that one.”

This is the gap PAI fills.

What PAI Builds on Top: Nine Layers

PAI’s SKILLSYSTEM.md defines a canonical structure that every skill must follow. It’s not a suggestion — it’s enforced by convention and by the CreateSkill skill that scaffolds new ones. Here’s what each layer adds.

Layer 1 — Canonical Structure

Claude Code just needs a SKILL.md. PAI requires a specific directory layout:

SkillName/
├── SKILL.md ← minimal routing only (40-50 lines)
├── Workflows/ ← execution procedures, one per task
│ ├── Create.md
│ └── Update.md
├── Tools/ ← TypeScript CLI tools (always present)
│ └── Generate.ts
└── ApiReference.md ← context files loaded on demand

SKILL.md stays minimal. Complexity lives in workflows and context files that load when actually needed.

Layer 2 — Workflow Routing

This is the most immediately useful layer. Claude Code routes to a skill. PAI routes within a skill.

The routing table in every SKILL.md dispatches sub-tasks to specific workflow files:

| Workflow | Trigger | File |
|-------------|----------------------------|---------------------------|
| **Create** | "write a post" | `Workflows/Create.md` |
| **Publish** | "publish", "deploy" | `Workflows/Publish.md` |
| **Header** | "create header image" | `Workflows/Header.md` |

“Write a post” and “publish the site” both activate the same skill, but they route to completely different procedures. Without this, a skill that handles multiple operations becomes one giant file Claude has to navigate by itself.

Layer 3 — The Personalization Layer

Every system skill in PAI checks for user overrides before executing:

~/.claude/skills/PAI/USER/SKILLCUSTOMIZATIONS/{SkillName}/
├── EXTEND.yaml ← merge strategy (append | override | deep_merge)
└── PREFERENCES.md ← user-specific behavior

The system skill stays generic and shareable. My preferences — color palettes for the Art skill, voice configurations for the Agents skill, output format defaults for Research — live separately and merge in at runtime. The skill author never needs to know about my preferences. I never need to fork the skill to add my own.

Layer 4 — System vs Personal Skill Separation

PAI enforces a hard naming convention that determines portability:

TitleCase (Research, Browser, OracleHCM) = system skills, no personal data, shareable via PAI Packs
_ALLCAPS (_BLOGGING, _MAQINA) = personal skills, private by convention, never exported

Claude Code has no concept of skill visibility. PAI makes it structural.

Layer 5 — CLI Tooling Convention

Every PAI skill has a Tools/ directory. When a workflow needs to do something repeatable — generate an image, manage a server, sync a repository — it calls a TypeScript CLI tool instead of embedding logic in the workflow markdown itself.

Tools use #!/usr/bin/env bun, expose configuration via flags, and have .help.md documentation files. This keeps workflows simple (intent routing) and tools encapsulated (execution). You can test a tool independently of its workflow.

Layer 6 — AI-Powered Hooks

PAI runs 17 event hooks that fire at specific moments: session start, prompt submission, pre-tool, post-tool, and others. The most important one for response quality is FormatReminder — it runs AI inference on your raw prompt before Claude even starts responding, classifies the depth required (FULL / ITERATION / MINIMAL), and injects that classification as authoritative context.

This is hooks doing real work, not just shell scripts appending text to prompts.

Layer 7 — The Algorithm

Every response PAI generates runs through a 7-phase problem-solving framework: OBSERVE → THINK → PLAN → BUILD → EXECUTE → VERIFY → LEARN.

This isn’t decorative structure. The OBSERVE phase reverse-engineers your actual intent. The THINK phase selects capabilities and validates skill choices against the problem. The VERIFY phase uses TaskCreate/TaskUpdate to track measurable success criteria. The LEARN phase captures what to improve next time.

Skills feed into this framework — they’re not parallel to it. When a skill activates, it executes inside the Algorithm, with its results held accountable to the ISC criteria created in OBSERVE.

Layer 8 — Agent Composition Patterns

PAI skills can spawn specialized subagents and compose them using named patterns:

Pattern	Shape	When
Pipeline	A → B → C	Sequential domain handoff
TDD Loop	Engineer ↔ QATester	Build-verify cycle
Fan-out	→ [A, B, C]	Multiple perspectives needed
Gate	A → check → B or retry	Quality gate before progression

A skill that just loads into context can’t orchestrate parallel agents. A PAI skill that routes to a workflow that invokes a Fan-out pattern can research, build, and verify in parallel — with a spotcheck agent at the end synthesizing results.

Layer 9 — Dynamic Loading

Large skills use deferred loading. Only the SKILL.md loads on invocation. Reference documents, API guides, and style specs load only when the specific workflow that needs them runs. This actively manages token budget rather than blowing it on context that might not be needed.

The Feature Gap in One Table

Feature	Claude Code (Native)	PAI System
Skill discovery	YAML `description` at startup	Same + `USE WHEN` intent parsing
Sub-routing	None	Workflow routing table
Personalization	None	`SKILLCUSTOMIZATIONS` layer
Skill visibility	All equal	System vs Personal convention
Tooling	None	TypeScript CLI tools
Hooks	Basic	17 AI-powered hooks
Response structure	Free-form	Algorithm (7 phases, ISC, verify)
Agents	None	15+ specialized subagent types
Memory	None	File-based cross-session memory
Dynamic loading	Full file loaded	Context files on demand
Portability	No convention	PAI Packs

Why This Matters Practically

The single most useful shift in mental model: Claude Code skills are context. PAI skills are operational units.

When I ask my system to publish a blog post, the publishing skill doesn’t just remind Claude how publishing works. It dispatches to the Publish workflow, which runs image conversion, calls hugo, commits, pushes to GitHub, and triggers the Actions pipeline that deploys to Namecheap FTP — all as a structured procedure with steps that can fail, be verified, and be corrected independently.

That’s not context injection. That’s execution.

The 34 skills on my system aren’t 34 long markdown files. They’re 34 capabilities, each with their own routing logic, personalization layer, tooling, and agent integration. Claude Code’s mechanism made them possible. PAI’s framework made them reliable.

Where to Go from Here

If you’re new to PAI and want to understand the broader architecture this sits inside — the memory system, the agent tiers, how RAG ties everything together — the prior post RAG, Agents, and Skills: The Three Pillars Inside My Personal AI covers the full picture.

If you want to go deeper on the skill system itself, the canonical reference is ~/.claude/skills/PAI/SYSTEM/SKILLSYSTEM.md — it’s the document all skills are built against, and it explains every convention described here in precise detail.

PAI is open source at github.com/danielmiessler/PAI .

Connecting PAI to NotebookLM via MCP: Your Research Becomes a Live Knowledge Layer

Sun, 15 Mar 2026 00:00:00 +0000

Connecting PAI to NotebookLM via MCP: Your Research Becomes a Live Knowledge Layer

I’ve been using Google’s NotebookLM for a while to manage research. Drop in a PDF, a few URLs, some YouTube transcripts — and suddenly I have a knowledge base I can interrogate with natural language. It answers questions grounded entirely in what I gave it, with citations to the exact source, no hallucinations.

The problem is it’s a separate tool. NotebookLM over here. PAI over there. My research couldn’t feed into my workflows, and my workflows didn’t know my research existed.

The Model Context Protocol changed that.

What MCP Actually Does (the short version)

The Model Context Protocol is a standard that lets AI systems connect to external tools and data sources through a defined interface — think of it as an API contract that any MCP-compatible client (like Claude Code) can speak without needing custom integration code for every new service.

When you wire an MCP server into Claude Code’s configuration, that server’s capabilities become available as tools inside every conversation. It’s not a plugin or a browser extension. It’s a live connection — authenticated, persistent, callable inside the same session where the Algorithm is running.

For NotebookLM, this means the boundary between “my research” and “my AI workflow” effectively disappears.

The Setup

The integration runs through a local MCP server binary at /Users/dsa/.local/bin/notebooklm-mcp. Authentication works through a Chrome browser profile — the server captures your active NotebookLM session (cookies, CSRF token, session ID) and caches it so every subsequent request is already authenticated. One notebooklm-mcp-auth command handles the initial handshake; after that, sessions persist across restarts.

In Claude Code’s configuration, it’s registered as a named MCP server:

"notebooklm": {
 "command": "/Users/dsa/.local/bin/notebooklm-mcp"
}

That’s the entire wiring. Claude Code sees the server at startup, the PAI NotebookLM skill knows how to invoke it, and the connection is live in every session from that point forward.

What the NotebookLM Skill Can Do

With the MCP bridge active, the NotebookLM skill exposes six workflows:

Workflow	What It Does
QueryNotebook	Ask a natural language question; get a citation-backed answer from your notebook sources
ListNotebooks	Show all notebooks with IDs and titles
CreateNotebook	Create a new notebook for a topic or project
AddSource	Add URLs, PDFs, YouTube videos, Google Drive files, or pasted text to a notebook
GenerateAudio	Create a podcast-style audio overview of a notebook’s contents
SyncSources	Refresh stale sources (Drive files, dynamic URLs)

The routing is intent-based, same as every other PAI skill. I don’t address the skill directly — I just describe what I need:

“What does my AI Governance notebook say about data lineage requirements?”

That hits the QueryNotebook workflow, fires the MCP query, and returns an answer with citations to the exact source sections that grounded it.

The Real Benefit: Grounded Answers Inside the Algorithm

Here’s what changes when NotebookLM is callable from inside PAI’s Algorithm.

In the standard PAI research flow, the THINK phase selects capabilities — often Research agents that go out to the web, synthesize content, and return findings. Those findings are model-generated. They’re high quality, but they’re inferences from training data and web retrieval. They can be wrong. They can drift from your actual source material.

NotebookLM answers don’t work that way. Every response is grounded in documents you explicitly added to that notebook. The model is constrained to those sources. It can’t invent facts that aren’t in them. When it tells you that a compliance framework requires a specific control, it points you to the exact paragraph in the exact document where that requirement lives.

When that kind of answer is callable from the THINK phase — as an input to ISC criteria, as evidence in the VERIFY phase, as a reference check in the EXECUTE phase — the entire workflow becomes more reliable. You’re not asking PAI to remember what a standard says. You’re asking it to look it up in the document you provided.

Scenarios Where This Changes Things

AI Governance Certification Study

I’m working through an AI Security & Governance certification — 8 modules, each with detailed technical and regulatory content. The study notes from each module live in my NotebookLM certification notebook.

When I’m reviewing or need to quiz myself, I don’t have to context-switch to the NotebookLM web UI. From inside PAI, I can ask:

“Query my AI Governance notebook: what are the key principles covered in module 3 around model risk management?”

The answer comes back cited to specific sections of the source material. I can follow up immediately within the same workflow. I can ask PAI to generate flashcard prompts based on the cited content. The research stays in NotebookLM where it lives. The workflow stays in PAI where it runs. The MCP bridge connects them without forcing me to copy-paste between tools.

Security Research Accumulation

Every time I add a research paper, a security advisory, or a threat report to a NotebookLM notebook, it becomes a queryable asset in PAI’s research layer. During an OSINT or reconnaissance workflow, instead of relying solely on real-time web retrieval, I can query my curated security research base for context that I’ve already vetted and accumulated.

“Does my security research notebook have anything on SSRF exploitation chains through cloud metadata endpoints?”

That’s my own research library answering me, not a model guessing.

Blog Content Drafting

For this blog — Augmented Resilience — I’m building a notebook that captures posts, ideas, and reader questions. Before drafting a new post, I can query:

“Does my Augmented Resilience notebook have any prior content on MCP integration?”

No more accidentally retreading ground I’ve already covered. No more losing track of connected ideas across posts. The notebook becomes an editorial memory that the Algorithm can access during the build phase.

The Audio Overview Feature Is Worth Its Own Mention

One capability that doesn’t have an obvious parallel in most AI tools: NotebookLM can generate a podcast-style audio overview of an entire notebook. Two AI voices discuss the material in a conversational format — synthesizing themes, surfacing key points, connecting ideas across sources.

Through the GenerateAudio workflow, I can trigger this from PAI:

“Generate an audio overview of my AI Governance notebook”

The result is a produced audio file I can listen to during a commute or while doing something else. It’s NotebookLM’s synthesis capability — which is genuinely impressive at extracting narrative threads from dense technical material — accessible through the same interface I use for everything else.

Knowledge That Compounds

The deeper benefit of this integration isn’t any single query — it’s the compounding effect of building curated notebooks over time and having them available in every PAI session.

Every source I add to NotebookLM becomes part of a retrieval layer that gets richer with every addition. The AI Governance notebook grows as I work through modules. The security research notebook grows as I read papers. The Oracle HCM notebook grows as I document fixes and configurations.

PAI already has a memory system for capturing what I do — completed work, learned patterns, quality signals. NotebookLM handles the complementary layer: the source material that grounds what I know. Together, they’re not two tools running side by side. They’re two layers of the same system — one remembering what I’ve done, the other grounding what I know.

MCP is just the wire between them.

RAG, Agents, and Skills: The Three Pillars Inside My Personal AI

Tue, 24 Feb 2026 00:00:00 +0000

RAG, Agents, and Skills: The Three Pillars Inside My Personal AI

This site — Augmented Resilience — didn’t get built the way most blogs do. There was no staring at blank Hugo config files, no manually hunting down Namecheap SSH docs, no scrambling to remember whether the deploy script needed the public/ folder cleaned before each build.

Instead, I described what I wanted. The AI knew my hosting setup (Namecheap shared hosting), my stack (Hugo with the re-terminal theme), my repo (GitHub, SSH-keyed), and my editor (Obsidian). When a build error surfaced — a theme name mismatch between hugo.toml and the actual directory — it was diagnosed and fixed before I had time to Google it. When the deploy script needed writing, it was scaffolded against my specific environment. When I accidentally left sensitive data in an early draft, it caught it before the commit.

None of that context lived in the prompt. It lived in the infrastructure.

The system behind it is called PAI (Personal AI Infrastructure) — an open-source framework I run locally on top of Claude Code. And the reason it could handle an entire site build end-to-end without constant hand-holding comes down to three architectural pillars: RAG, Agents, and Skills.

What Is PAI?

PAI is an open-source personal AI infrastructure system that runs on top of Claude Code. It’s not a SaaS product — it’s a framework you install on your own machine. The system is built around a central idea: AI systems need structure to be reliable. Like scaffolding supports construction, PAI provides the architectural patterns that make AI assistance consistent, contextual, and capable of compounding over time.

There are 34 skills installed on my system, 17 event hooks, 141 workflows, and a memory system that learns from every interaction. But none of that would matter without three core mechanisms working in concert.

The PAI statusline — live system stats showing version (v2.4), algorithm (ALG:v0.2.25), skill count (SK: 34), workflows (WF: 141), hooks (17), context usage (48%), memory signals (144 ratings), and a rolling quality score trend.

Pillar 1: RAG — Your Personal Knowledge Base In Every Response

RAG (Retrieval-Augmented Generation) is the pattern of retrieving relevant documents from a knowledge store and augmenting the AI’s prompt with that context before generating a response. In enterprise AI, this is how you get a chatbot that can answer questions about your internal policies without hallucinating.

In PAI, RAG is the engine that makes the AI feel like it knows you.

How It Works in PAI

When a session starts, PAI’s hook system loads a foundational context layer: my identity, my name, the current date, and the core behavioral rules (the Algorithm). This is the retrieval index — a lightweight map of everything the system knows how to find.

When I make a request, the system retrieves additional context on demand:

Skills frontmatter — Each of the 34 skills has a description field with a USE WHEN clause. These descriptions load at startup as a routing index. When my request matches a skill’s intent, the full skill content loads. This is retrieval — pulling in the right expertise document for the task.
USER/ context files — There’s a structured personal knowledge base living at ~/.claude/skills/PAI/USER/. It contains my resume, my TELOS life goals, my contacts, my projects, my tech stack preferences. When I ask a question where my professional background is relevant, that context gets retrieved and injected.
MEMORY/ directory — Every session, every correction, every insight gets captured in a structured memory system organized into WORK/, LEARNING/, SIGNALS/, and RESEARCH/ directories. Past work items, completed tasks, and quality signals from previous interactions can all be retrieved to inform the current one.
Hook-injected context — Event hooks fire at specific moments (session start, before each prompt, after tool use) and inject dynamic context — things like the current depth classification, relevant behavioral rules, or system state.

Practical Scenario: Building Augmented Resilience

When I was setting up this site and ran into the Hugo theme mismatch error, here’s what PAI retrieved without me explaining any of it:

My tech stack preferences from the USER context — Hugo, GitHub, Namecheap, Obsidian as editor
The WebSavant skill loaded automatically (matched “Hugo site”, “deployment” intent), bringing with it Hugo-specific knowledge about theme configuration, hugo.toml structure, and build pipelines
My project context from MEMORY — the repo name, the hosting environment, decisions made in prior sessions about the deploy workflow

By the time I described the error, PAI already knew the environment it was debugging. It didn’t need me to explain what kind of hosting I had, which theme I was using, or what my folder structure looked like. The retrieval layer had already assembled that context before a single word of the solution was written.

That’s the difference RAG makes — not smarter AI, but contextually equipped AI.

Pillar 2: Skills — Domain Expertise That Activates Itself

If RAG is how PAI knows context, Skills are how PAI does work. A skill is a self-contained expertise module that activates automatically based on intent, routes to the right workflow, and executes a structured procedure.

Think of each skill as a senior specialist on call — and you never have to explicitly page them.

The Anatomy of a Skill

Every skill follows the same structure:

SkillName/
├── SKILL.md ← Routing layer (loads on invocation)
├── Workflows/ ← Step-by-step execution procedures
│ └── Create.md
│ └── Update.md
└── Tools/ ← CLI automation scripts (TypeScript)
└── Generate.ts

The SKILL.md file has two parts:

YAML frontmatter with a USE WHEN clause — this is how Claude Code knows when to activate the skill
Workflow routing table — once activated, this routes the request to the correct workflow file

The magic is in USE WHEN. Here’s a simplified example from the OracleHCM skill:

description: Expert Oracle HCM Cloud troubleshooting and guidance. USE WHEN user
 mentions Oracle HCM, HCM Cloud, HDL, HCM Data Loader, Journey, Checklist,
 workflow approvals, autocomplete rules, fast formulas, security profiles...

I never have to say “use the Oracle HCM skill.” I just describe my problem in natural language. The intent matching system routes it.

Practical Scenario: Configuring SEO and GEO for augmentedresilience.com

Before the site went live, I needed proper SEO and GEO — Open Graph tags for social sharing, meta descriptions for search, canonical URLs, a sitemap, and schema.org structured data so AI-powered search engines like Perplexity, ChatGPT, and Claude could understand and cite the content accurately. None of that comes configured out of the box with the re-terminal theme. I said:

“Set up SEO and apply Generative Engine Optimization to augmentedresilience.com.”

The system:

Activated the WebSavant skill (matched “SEO” + “GEO” + “site” intent — no skill named, no flags set)
Routed to the SEO and AddSchema workflows inside that skill
Created the correct Hugo partial override at layouts/partials/extended_head.html — the exact injection point the re-terminal theme exposes without touching any theme files
Added the full Open Graph tag set (og:title, og:description, og:image, og:url, og:type) wired to Hugo’s page variables
Injected schema.org JSON-LD for every page type: WebSite and Person on the homepage, Article and BreadcrumbList on every post, and AboutPage on /about — giving AI crawlers a machine-readable knowledge graph of the site
Created robots.txt explicitly permitting GPTBot, ClaudeBot, PerplexityBot, and other AI crawlers — with the sitemap URL wired in
Configured hugo.toml for canonical URL generation and enabled the built-in sitemap output

Without the skill, this is a day of Hugo documentation, schema.org spec-reading, and trial-and-error. With it, the full SEO and GEO stack was complete in a single pass — because the skill had already encoded where everything goes in Hugo’s directory structure, which schema types matter for which page contexts, and how to wire Hugo’s template variables into valid JSON-LD that AI search engines can actually parse.

The 34 Skills I Have Installed

My current skill roster includes tools for Oracle HCM support, security recon, OSINT research, browser automation, art generation, document processing, code generation, red teaming, and more. Each one is a packaged capability that activates without friction.

The system is also designed to be extended — building a new skill means writing a SKILL.md with a USE WHEN clause, a workflow routing table, and the workflow files. The CreateSkill skill handles the scaffolding automatically.

Pillar 3: Agents — Parallel Specialized Brains

Skills handle individual domain expertise. Agents handle scale and specialization when a task is too complex for a single pass or requires multiple perspectives simultaneously.

PAI has a three-tier agent system:

Tier 1: Task Tool Subagents (Internal Workhorses)

These are pre-built specialist agents that skills and workflows invoke internally: Engineer, Architect, Explore, QATester, Pentester, ClaudeResearcher, GeminiResearcher, GrokResearcher, and others.

When I ask PAI to research something deeply, it doesn’t just run one search. It can fan out to multiple research agents simultaneously — Claude, Gemini, and Grok each investigating from different angles — then synthesize the results with a “spotcheck” agent that verifies consistency.

This is parallel processing that would take me hours of manual work, running in minutes.

Tier 2: Named Agents (Persistent Specialists)

Named agents are recurring characters with rich backstories, persistent identities, and unique voices via ElevenLabs text-to-speech. They build relationship continuity across sessions.

My installed named agents include:

Serena Blackwood — Architect. Long-term system design decisions.
Marcus Webb — Engineer. Strategic technical leadership.
Rook Blackburn — Pentester. Security testing with a distinct personality.
Ava Sterling — Researcher (Claude). Strategic deep-dive analysis.
Alex Rivera — Researcher (Gemini). Multi-perspective comprehensive analysis.

When Rook runs a security assessment, he doesn’t just return findings — he announces them in his own voice through my speakers. It sounds minor. It’s not. Distinct voices make it cognitively easier to understand who did what work and why you should trust it.

Tier 3: Custom Agents (On-Demand Compositions)

For tasks that don’t fit a named agent, PAI can compose agents dynamically from trait combinations:

Expertise traits: security, legal, finance, medical, research, technical, creative
Personality traits: skeptical, enthusiastic, analytical, contrarian, meticulous
Approach traits: thorough, rapid, systematic, adversarial, synthesizing

Each unique trait combination maps to a different ElevenLabs voice. A security + adversarial agent gets Callum’s edgy voice. An analytical + meticulous agent gets Charlotte’s precise cadence.

The trait system means I can spin up a custom agent for any edge case without writing a new agent from scratch.

Practical Scenario: Pre-Launch Validation of Augmented Resilience

Before pushing the first real commit to augmentedresilience.com, I wasn’t going to just cross my fingers and run deploy.py. I asked PAI to validate the site was actually ready. What happened next wasn’t a single check — it was a parallel review board.

PAI spawned three agents simultaneously:

Rook Blackburn (Pentester) scanned the entire repo for credentials, API keys, and sensitive data I might have accidentally left in a config file or draft post — and announced his findings in his own voice through my speakers
A QA agent opened the Hugo local preview, walked every page, verified links weren’t broken, images loaded, and the deploy pipeline produced a clean public/ build
A Researcher agent audited the site’s meta tags, Open Graph data, and hugo.toml settings against SEO best practices for a new blog

A fourth spotcheck agent then reviewed all three outputs for conflicts — did Rook’s findings overlap with anything the QA agent flagged? Were there config issues that touched both SEO and security?

The result was a single consolidated pre-launch checklist. Two issues surfaced: a leftover draft post with personal notes still marked draft: false, and a missing og:image tag. Both fixed before the first visitor ever landed.

The site you’re reading right now went live clean because three agents checked it in parallel before I touched the deploy button. That’s the difference between asking a question and deploying intelligence.

When All Three Work Together

The real power of PAI isn’t any single pillar — it’s the composition.

Here’s what happened when I needed to go from “Obsidian draft” to “live on augmentedresilience.com” without a manual process I’d eventually forget or skip.

RAG assembled the context before I described the problem. From MEMORY it already knew: Namecheap shared hosting doesn’t support native git pull — content has to be pushed via FTP through GitHub Actions. It knew the Obsidian vault was the content source, that images.py had to run before hugo to convert image links, and that Python was the right tool for the orchestration script. Not one of those constraints was in my prompt.

Skills handled the architecture. WebSavant recognized a Hugo deployment pipeline request and routed to a workflow already aware of the full delivery chain: sync posts from Obsidian → convert images → hugo build → git commit → git push → GitHub Actions → Namecheap FTP. It knew the sequence. It knew why each step had to happen in that order.

Agents built it. An Engineer agent wrote deploy.py — the script that runs the whole sequence in a single command. An Architect agent designed the GitHub Actions workflow that picks up after the push and handles the Namecheap delivery step automatically. Two agents, two distinct responsibilities, running the job that a solo developer would have spent an afternoon piecing together from Stack Overflow answers.

The result:

python3 deploy.py "Add new post"

That’s it. One command. Every post that has ever gone live on this site — including this one — passed through a pipeline that RAG, Skills, and Agents built together. It’s not running because I set it up manually. It’s running because three systems knew what the job required before I finished explaining it.

The Compounding Effect

What makes this architecture meaningful over time isn’t any single interaction — it’s that the system gets better at helping you with every session.

The MEMORY system captures learnings. The SIGNALS directory tracks your implicit feedback. When something goes wrong, it logs the full context under LEARNING/FAILURES. When a workflow produces a 9-10 response, that signal is captured too. The system adjusts.

Generic AI starts fresh every time. PAI compounds.

I’m still early in this — the personal profile files still have template placeholders I haven’t filled in, and there are skills I’ve barely touched. But even at partial configuration, the system already thinks more like a senior colleague than a search engine.

That’s what RAG, Agents, and Skills make possible together: an AI that knows your context, activates the right expertise automatically, and scales parallel intelligence for complex work — all without you having to manage the machinery.