Tutorial on

Upgrading My PDF Converter to IBM's Docling

Sat, 02 May 2026 00:00:00 +0000

When My Own Tool Couldn’t Handle My Work

The error message was easy to dismiss: RapidOCR returned empty result!. It appeared twice in the terminal, then silence — a blank .md file where a 40-page Oracle HCM implementation guide should have been. The PDF had come straight from Oracle’s support portal, the same format I use for every triage session. But this one stored its pages as images, and PyMuPDF4LLM had nothing to work with.

That was one category of failure. The other was quieter. For documents that did convert, I started noticing the tables were wrong — not corrupted, just structurally dissolved. An eligibility matrix that should have had six clearly labeled columns came back as a run of loosely connected text. Useful for nothing.

I had built this tool to serve my Oracle work. Then my Oracle work showed me exactly where it fell short.

The Problem with PyMuPDF4LLM

If you’ve followed this series, you know that PyMuPDF4LLM was a solid choice when I first built the converter . It handled text-based PDFs cleanly, installed without friction, and required almost no configuration. For research papers and simple documentation, it worked well.

But Oracle HCM documentation is a different category of document. Oracle’s guides are dense with tables: configuration reference grids, eligibility matrices, step-and-action setup tables. These are not decorative — they carry most of the meaning. When PyMuPDF4LLM dissolved those tables into unstructured text, it was silently degrading the most important parts of the document.

The image-based PDF problem was a hard wall. If a document was captured as page images rather than extractable text, the converter returned nothing. No partial output, no warning — just empty files.

Discovering Docling

IBM Research Zurich’s AI for Knowledge team open-sourced Docling in July 2024. The project has a specific focus: turning complex documents into structured, AI-ready output. In April 2025, IBM donated it to the Linux Foundation AI & Data, and it now powers data ingestion for Red Hat Enterprise Linux AI. As of this writing it has over 24,000 GitHub stars.

What makes Docling different is that it treats document conversion as a computer vision problem, not just a text extraction problem.

Layout analysis: Docling uses an RT-DETR-derived model trained on DocLayNet — IBM’s human-annotated dataset of real-world documents — to detect and classify every region on the page: tables, figures, headers, footers, section titles, body text. It knows the structure before it extracts any content.

Table reconstruction: This is where Docling earns its place for Oracle documentation. It uses a vision transformer called TableFormer that predicts row/column structure and header roles directly from the page image. The result is a proper Markdown table, not a stream of cell values.

Image-based PDFs: For documents stored as page images, Docling integrates OCR into its pipeline natively. The same converter handles text-based and image-based PDFs without any changes on your end.

The Switch

The API change was minimal. The old code:

import pymupdf4llm

md_text = pymupdf4llm.to_markdown(pdf_path)

The new code:

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert(pdf_path)
md_text = result.document.export_to_markdown()

Three lines instead of one, but the extra structure pays dividends: DocumentConverter can be initialized once and reused across an entire batch, which matters when processing a folder of 50 Oracle guides.

A note on startup: The first time you run Docling, it downloads its ML models from Hugging Face. You will see this:

Loading weights: 100%|██████████| 770/770 [00:00<00:00, 1656.35it/s]

This is normal. The models cache locally after the first download and subsequent runs start immediately. If you see a warning about HF_TOKEN, that is also expected — Docling works without one, but setting a token removes the rate-limit warning:

echo 'export HF_TOKEN="hf_your_token_here"' >> ~/.zshrc

What Changed in Practice

Oracle documentation: Tables that previously collapsed into text now render as proper Markdown tables. A 6-column configuration reference comes back with headers intact and every row correctly aligned.

AI books: My knowledge base includes dense technical books on LLM engineering and machine learning. These have complex layouts — sidebars, multi-column sections, figures with captions. Docling’s layout model handles these significantly better than PyMuPDF4LLM’s heuristic approach.

Image-based PDFs: Documents that previously produced empty output now convert cleanly. The two-step workaround (ocrmypdf → pdf2md) is no longer necessary for most cases.

Two Other Improvements

While I was updating the engine, I added two things that were overdue:

DOCX support. The converter now handles Word documents using pandoc as a backend. The same pdf2md command works for both file types. This matters for Oracle support exports and study notes from my reMarkable.

Batch manifest. When processing a large folder, the converter now writes a manifest file tracking which files have been converted and their checksums. Re-running on the same folder skips files that haven’t changed. A --force flag overrides this when you need a fresh conversion.

pdf2md --batch ~/oracle-pdfs/ # skips already-converted
pdf2md --batch ~/oracle-pdfs/ --force # reconverts everything

What’s Next

The web UI — which I added in the last post — has also been updated to use Docling. Drag a PDF onto it, click Convert, and the same deep-learning pipeline runs behind the scenes.

The next thing I want to add is direct output to the Obsidian inbox. Right now the flow is: convert → download ZIP → move to vault. A toggle that sends output directly to ~/projects/obsidian-vault/00-inbox/ would cut that manual step entirely.

The tool is doing what I originally wanted: converting my Oracle documentation and AI library into clean, searchable Markdown. Docling is what makes that reliable for the documents that actually matter.

Adding a Web UI to My PDF to Markdown Converter

Sat, 28 Mar 2026 00:00:00 +0000

The Promise I Made to Myself

In my last post about building the PDF to Markdown converter , I listed some “what’s next” ideas at the end. One of them was:

FastAPI wrapper: Create an HTTP API for web apps to use

Well, I did it. And I went a step further — I built a full drag-and-drop web UI on top of it.

The CLI still works exactly as before. This is an addition, not a replacement. But now when I want to convert a batch of PDFs without thinking about terminal commands, I just open a browser tab.

What the UI Does

The interface is intentionally minimal:

Drag-and-drop zone — drop one PDF or fifty onto it
Browse button — if you prefer clicking
Convert button — kicks off the conversion
Per-file progress bars — live updates as each file converts
Individual download — each completed file gets its own Download button
Download all as ZIP — one click to grab everything
Clear — resets the session and cleans up temp files server-side

Everything runs locally. Files go to a temp directory on your machine, get converted, and are served back to you. Nothing hits an external API.

The Stack

I kept it as simple as possible:

Backend: FastAPI + uvicorn

FastAPI was the obvious choice — it handles file uploads cleanly, has first-class async support, and the python-multipart library makes multi-file form handling trivial. The conversion logic is unchanged from the CLI: pymupdf4llm.to_markdown() doing the heavy lifting.

Progress updates: Server-Sent Events (SSE)

This is the part I found most interesting. When you hit Convert, the browser opens a persistent connection to /progress/{job_id} and receives a stream of JSON events — one every 400ms — until the job finishes. No polling loop, no WebSocket complexity. SSE is perfect for this: unidirectional, simple, and built into every modern browser.

async def event_stream():
 while True:
 data = json.dumps({"progress": job["progress"], "done": job["done"]})
 yield f"data: {data}\n\n"
 if job["done"]:
 break
 await asyncio.sleep(0.4)

return StreamingResponse(event_stream(), media_type="text/event-stream")

On the frontend, consuming it is three lines:

const eventSource = new EventSource(`/progress/${jobId}`);
eventSource.onmessage = e => {
 const { progress, done } = JSON.parse(e.data);
 // update the UI...
};

Threading: The conversion itself is synchronous (PyMuPDF4LLM blocks while it processes pages). To keep the FastAPI event loop from freezing during conversion, each job runs in a ThreadPoolExecutor:

asyncio.get_event_loop().run_in_executor(executor, _convert_job, job_id, job_dir)

Four workers by default — enough to handle several simultaneous conversions without overloading the machine.

Frontend: Vanilla JS, no build step

I deliberately avoided React, Vue, or any framework. The whole UI is a single static/index.html file. It loads instantly, has no dependencies to install, and is easy to read and modify. For a local tool that one person uses, this is the right call.

Project Structure

Here’s what changed from the original CLI project:

pdf-to-markdown/
 pdf2md — original CLI (unchanged)
 app.py — FastAPI server (new)
 static/
 index.html — drag-drop UI (new)
 serve — start script (new)
 requirements.txt — updated with FastAPI deps
 venv/ — existing venv, three new packages added

The serve script is just:

#!/usr/bin/env bash
cd "$(dirname "$0")"
source venv/bin/activate
uvicorn app:app --host 0.0.0.0 --port 8765 --reload

Run it once, open http://localhost:8765, and you have a working converter in your browser.

One Gotcha: PyMuPDF4LLM is Synchronous

This tripped me up briefly. pymupdf4llm.to_markdown() does not return a coroutine — it’s a blocking call that can take 10–30 seconds on a large document. If you call it directly in an async FastAPI route handler, you freeze the entire event loop while it runs. No other requests get handled. The SSE stream stops updating.

The fix is the ThreadPoolExecutor pattern above — push the blocking work off the event loop entirely. The async route returns immediately, the SSE stream keeps ticking, and the conversion runs in a thread pool where it belongs.

The Download Endpoints

Three endpoints handle output:

GET /download/{job_id}/{filename} — single .md file
GET /download-all/{job_id} — all .md files as a ZIP
DELETE /job/{job_id} — clean up temp files

The ZIP is built in memory using Python’s zipfile module and streamed directly to the browser — no intermediate file on disk:

buf = io.BytesIO()
with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as zf:
 for f in md_files:
 zf.write(f, f.name)
buf.seek(0)
return StreamingResponse(buf, media_type="application/zip", ...)

What This Unlocks

The CLI was already useful. The web UI adds a few things the CLI cannot easily do:

Non-terminal users. Anyone on my network can now use this converter by visiting http://my-machine:8765. No Python knowledge required.

Bulk drop workflows. Dragging 20 PDFs from Finder into a browser window and clicking Convert is significantly faster than constructing a --batch command with the right paths.

Visual feedback. The progress bars are not just cosmetic. For large PDFs that take 20–30 seconds, knowing the conversion is running (and roughly how far along it is) removes the anxiety of staring at a terminal cursor.

What’s Next

The original roadmap item was “FastAPI wrapper.” That’s done. The next one I’m eyeing:

Auto-feed to Obsidian inbox. Right now the flow is: convert in the web UI, download the ZIP, unzip, move to Obsidian. I’d like to add a toggle: “Send output directly to ~/projects/obsidian-vault/00-inbox/” — one less manual step.

That’s a small addition to the backend. Coming soon.

Running It

cd ~/projects/pdf-to-markdown
./serve
# Open http://localhost:8765

The first run installs nothing new — the three new packages (fastapi, uvicorn, python-multipart) are already in the venv. It just works.

Claude Code Has Skills. PAI Has a Skill System. Here's the Difference.

Sun, 15 Mar 2026 00:00:00 +0000

Claude Code Has Skills. PAI Has a Skill System. Here’s the Difference.

There’s a word that shows up in both Claude Code’s documentation and in PAI’s architecture: skills. And because they share the same word — and even the same file conventions — it’s easy to assume they’re roughly equivalent. One is just a slightly fancier version of the other.

They’re not. The relationship is closer to HTTP and a web framework. Claude Code’s skill mechanism is the protocol. PAI is the framework built on top of it.

Understanding that distinction changed how I think about what I’ve actually built on my machine.

Start Here: What Claude Code’s Skill Mechanism Actually Is

Before explaining what PAI adds, it’s worth being precise about what Claude Code provides natively — because it’s both more minimal and more elegant than most people realize.

Claude Code’s skill system works like this:

At startup, Claude reads every SKILL.md file it finds under ~/.claude/skills/
The description field in each skill’s YAML frontmatter determines when that skill activates — it’s pure intent matching. Anthropic caps this at 1024 characters.
When a skill matches your request, the Skill tool injects the full SKILL.md content into Claude’s context window
Claude follows the instructions in that file

That’s the entire mechanism. It’s a context injection system with a routing layer.

The USE WHEN clause in a skill description is the key piece. Here’s a simplified example from my OracleHCM skill:

---
name: OracleHCM
description: Expert Oracle HCM Cloud troubleshooting. USE WHEN user mentions
 Oracle HCM, HCM Cloud, HDL, HCM Data Loader, Journey, Checklist, workflow
 approvals, autocomplete rules, fast formulas, security profiles...
---

When I describe an Oracle HCM problem in natural language, Claude Code matches my intent against that description and loads the skill. I never have to say “use the Oracle HCM skill.” The intent matching handles it.

Elegant. Minimal. And — on its own — surprisingly limited.

The Gap Between “Context Injection” and “Operational Capability”

Imagine a skill that’s just a long markdown file. When it loads, Claude reads the instructions and tries to follow them. If the instructions are clear and the task is simple, that works fine. But for anything complex — something that involves multiple steps, personalized behavior, CLI tooling, external APIs, or parallel agents — a single markdown file loaded into context starts to break down.

The instructions get long. They can’t be personalized without making the skill personal (and therefore un-shareable). There’s no way to dispatch to a sub-procedure. There’s no tooling layer. There’s no way to say “if the user wants to create a blog post, follow this procedure; if they want to publish, follow that one.”

This is the gap PAI fills.

What PAI Builds on Top: Nine Layers

PAI’s SKILLSYSTEM.md defines a canonical structure that every skill must follow. It’s not a suggestion — it’s enforced by convention and by the CreateSkill skill that scaffolds new ones. Here’s what each layer adds.

Layer 1 — Canonical Structure

Claude Code just needs a SKILL.md. PAI requires a specific directory layout:

SkillName/
├── SKILL.md ← minimal routing only (40-50 lines)
├── Workflows/ ← execution procedures, one per task
│ ├── Create.md
│ └── Update.md
├── Tools/ ← TypeScript CLI tools (always present)
│ └── Generate.ts
└── ApiReference.md ← context files loaded on demand

SKILL.md stays minimal. Complexity lives in workflows and context files that load when actually needed.

Layer 2 — Workflow Routing

This is the most immediately useful layer. Claude Code routes to a skill. PAI routes within a skill.

The routing table in every SKILL.md dispatches sub-tasks to specific workflow files:

| Workflow | Trigger | File |
|-------------|----------------------------|---------------------------|
| **Create** | "write a post" | `Workflows/Create.md` |
| **Publish** | "publish", "deploy" | `Workflows/Publish.md` |
| **Header** | "create header image" | `Workflows/Header.md` |

“Write a post” and “publish the site” both activate the same skill, but they route to completely different procedures. Without this, a skill that handles multiple operations becomes one giant file Claude has to navigate by itself.

Layer 3 — The Personalization Layer

Every system skill in PAI checks for user overrides before executing:

~/.claude/skills/PAI/USER/SKILLCUSTOMIZATIONS/{SkillName}/
├── EXTEND.yaml ← merge strategy (append | override | deep_merge)
└── PREFERENCES.md ← user-specific behavior

The system skill stays generic and shareable. My preferences — color palettes for the Art skill, voice configurations for the Agents skill, output format defaults for Research — live separately and merge in at runtime. The skill author never needs to know about my preferences. I never need to fork the skill to add my own.

Layer 4 — System vs Personal Skill Separation

PAI enforces a hard naming convention that determines portability:

TitleCase (Research, Browser, OracleHCM) = system skills, no personal data, shareable via PAI Packs
_ALLCAPS (_BLOGGING, _MAQINA) = personal skills, private by convention, never exported

Claude Code has no concept of skill visibility. PAI makes it structural.

Layer 5 — CLI Tooling Convention

Every PAI skill has a Tools/ directory. When a workflow needs to do something repeatable — generate an image, manage a server, sync a repository — it calls a TypeScript CLI tool instead of embedding logic in the workflow markdown itself.

Tools use #!/usr/bin/env bun, expose configuration via flags, and have .help.md documentation files. This keeps workflows simple (intent routing) and tools encapsulated (execution). You can test a tool independently of its workflow.

Layer 6 — AI-Powered Hooks

PAI runs 17 event hooks that fire at specific moments: session start, prompt submission, pre-tool, post-tool, and others. The most important one for response quality is FormatReminder — it runs AI inference on your raw prompt before Claude even starts responding, classifies the depth required (FULL / ITERATION / MINIMAL), and injects that classification as authoritative context.

This is hooks doing real work, not just shell scripts appending text to prompts.

Layer 7 — The Algorithm

Every response PAI generates runs through a 7-phase problem-solving framework: OBSERVE → THINK → PLAN → BUILD → EXECUTE → VERIFY → LEARN.

This isn’t decorative structure. The OBSERVE phase reverse-engineers your actual intent. The THINK phase selects capabilities and validates skill choices against the problem. The VERIFY phase uses TaskCreate/TaskUpdate to track measurable success criteria. The LEARN phase captures what to improve next time.

Skills feed into this framework — they’re not parallel to it. When a skill activates, it executes inside the Algorithm, with its results held accountable to the ISC criteria created in OBSERVE.

Layer 8 — Agent Composition Patterns

PAI skills can spawn specialized subagents and compose them using named patterns:

Pattern	Shape	When
Pipeline	A → B → C	Sequential domain handoff
TDD Loop	Engineer ↔ QATester	Build-verify cycle
Fan-out	→ [A, B, C]	Multiple perspectives needed
Gate	A → check → B or retry	Quality gate before progression

A skill that just loads into context can’t orchestrate parallel agents. A PAI skill that routes to a workflow that invokes a Fan-out pattern can research, build, and verify in parallel — with a spotcheck agent at the end synthesizing results.

Layer 9 — Dynamic Loading

Large skills use deferred loading. Only the SKILL.md loads on invocation. Reference documents, API guides, and style specs load only when the specific workflow that needs them runs. This actively manages token budget rather than blowing it on context that might not be needed.

The Feature Gap in One Table

Feature	Claude Code (Native)	PAI System
Skill discovery	YAML `description` at startup	Same + `USE WHEN` intent parsing
Sub-routing	None	Workflow routing table
Personalization	None	`SKILLCUSTOMIZATIONS` layer
Skill visibility	All equal	System vs Personal convention
Tooling	None	TypeScript CLI tools
Hooks	Basic	17 AI-powered hooks
Response structure	Free-form	Algorithm (7 phases, ISC, verify)
Agents	None	15+ specialized subagent types
Memory	None	File-based cross-session memory
Dynamic loading	Full file loaded	Context files on demand
Portability	No convention	PAI Packs

Why This Matters Practically

The single most useful shift in mental model: Claude Code skills are context. PAI skills are operational units.

When I ask my system to publish a blog post, the publishing skill doesn’t just remind Claude how publishing works. It dispatches to the Publish workflow, which runs image conversion, calls hugo, commits, pushes to GitHub, and triggers the Actions pipeline that deploys to Namecheap FTP — all as a structured procedure with steps that can fail, be verified, and be corrected independently.

That’s not context injection. That’s execution.

The 34 skills on my system aren’t 34 long markdown files. They’re 34 capabilities, each with their own routing logic, personalization layer, tooling, and agent integration. Claude Code’s mechanism made them possible. PAI’s framework made them reliable.

Where to Go from Here

If you’re new to PAI and want to understand the broader architecture this sits inside — the memory system, the agent tiers, how RAG ties everything together — the prior post RAG, Agents, and Skills: The Three Pillars Inside My Personal AI covers the full picture.

If you want to go deeper on the skill system itself, the canonical reference is ~/.claude/skills/PAI/SYSTEM/SKILLSYSTEM.md — it’s the document all skills are built against, and it explains every convention described here in precise detail.

PAI is open source at github.com/danielmiessler/PAI .

RAG, Agents, and Skills: The Three Pillars Inside My Personal AI

Tue, 24 Feb 2026 00:00:00 +0000

RAG, Agents, and Skills: The Three Pillars Inside My Personal AI

This site — Augmented Resilience — didn’t get built the way most blogs do. There was no staring at blank Hugo config files, no manually hunting down Namecheap SSH docs, no scrambling to remember whether the deploy script needed the public/ folder cleaned before each build.

Instead, I described what I wanted. The AI knew my hosting setup (Namecheap shared hosting), my stack (Hugo with the re-terminal theme), my repo (GitHub, SSH-keyed), and my editor (Obsidian). When a build error surfaced — a theme name mismatch between hugo.toml and the actual directory — it was diagnosed and fixed before I had time to Google it. When the deploy script needed writing, it was scaffolded against my specific environment. When I accidentally left sensitive data in an early draft, it caught it before the commit.

None of that context lived in the prompt. It lived in the infrastructure.

The system behind it is called PAI (Personal AI Infrastructure) — an open-source framework I run locally on top of Claude Code. And the reason it could handle an entire site build end-to-end without constant hand-holding comes down to three architectural pillars: RAG, Agents, and Skills.

What Is PAI?

PAI is an open-source personal AI infrastructure system that runs on top of Claude Code. It’s not a SaaS product — it’s a framework you install on your own machine. The system is built around a central idea: AI systems need structure to be reliable. Like scaffolding supports construction, PAI provides the architectural patterns that make AI assistance consistent, contextual, and capable of compounding over time.

There are 34 skills installed on my system, 17 event hooks, 141 workflows, and a memory system that learns from every interaction. But none of that would matter without three core mechanisms working in concert.

The PAI statusline — live system stats showing version (v2.4), algorithm (ALG:v0.2.25), skill count (SK: 34), workflows (WF: 141), hooks (17), context usage (48%), memory signals (144 ratings), and a rolling quality score trend.

Pillar 1: RAG — Your Personal Knowledge Base In Every Response

RAG (Retrieval-Augmented Generation) is the pattern of retrieving relevant documents from a knowledge store and augmenting the AI’s prompt with that context before generating a response. In enterprise AI, this is how you get a chatbot that can answer questions about your internal policies without hallucinating.

In PAI, RAG is the engine that makes the AI feel like it knows you.

How It Works in PAI

When a session starts, PAI’s hook system loads a foundational context layer: my identity, my name, the current date, and the core behavioral rules (the Algorithm). This is the retrieval index — a lightweight map of everything the system knows how to find.

When I make a request, the system retrieves additional context on demand:

Skills frontmatter — Each of the 34 skills has a description field with a USE WHEN clause. These descriptions load at startup as a routing index. When my request matches a skill’s intent, the full skill content loads. This is retrieval — pulling in the right expertise document for the task.
USER/ context files — There’s a structured personal knowledge base living at ~/.claude/skills/PAI/USER/. It contains my resume, my TELOS life goals, my contacts, my projects, my tech stack preferences. When I ask a question where my professional background is relevant, that context gets retrieved and injected.
MEMORY/ directory — Every session, every correction, every insight gets captured in a structured memory system organized into WORK/, LEARNING/, SIGNALS/, and RESEARCH/ directories. Past work items, completed tasks, and quality signals from previous interactions can all be retrieved to inform the current one.
Hook-injected context — Event hooks fire at specific moments (session start, before each prompt, after tool use) and inject dynamic context — things like the current depth classification, relevant behavioral rules, or system state.

Practical Scenario: Building Augmented Resilience

When I was setting up this site and ran into the Hugo theme mismatch error, here’s what PAI retrieved without me explaining any of it:

My tech stack preferences from the USER context — Hugo, GitHub, Namecheap, Obsidian as editor
The WebSavant skill loaded automatically (matched “Hugo site”, “deployment” intent), bringing with it Hugo-specific knowledge about theme configuration, hugo.toml structure, and build pipelines
My project context from MEMORY — the repo name, the hosting environment, decisions made in prior sessions about the deploy workflow

By the time I described the error, PAI already knew the environment it was debugging. It didn’t need me to explain what kind of hosting I had, which theme I was using, or what my folder structure looked like. The retrieval layer had already assembled that context before a single word of the solution was written.

That’s the difference RAG makes — not smarter AI, but contextually equipped AI.

Pillar 2: Skills — Domain Expertise That Activates Itself

If RAG is how PAI knows context, Skills are how PAI does work. A skill is a self-contained expertise module that activates automatically based on intent, routes to the right workflow, and executes a structured procedure.

Think of each skill as a senior specialist on call — and you never have to explicitly page them.

The Anatomy of a Skill

Every skill follows the same structure:

SkillName/
├── SKILL.md ← Routing layer (loads on invocation)
├── Workflows/ ← Step-by-step execution procedures
│ └── Create.md
│ └── Update.md
└── Tools/ ← CLI automation scripts (TypeScript)
└── Generate.ts

The SKILL.md file has two parts:

YAML frontmatter with a USE WHEN clause — this is how Claude Code knows when to activate the skill
Workflow routing table — once activated, this routes the request to the correct workflow file

The magic is in USE WHEN. Here’s a simplified example from the OracleHCM skill:

description: Expert Oracle HCM Cloud troubleshooting and guidance. USE WHEN user
 mentions Oracle HCM, HCM Cloud, HDL, HCM Data Loader, Journey, Checklist,
 workflow approvals, autocomplete rules, fast formulas, security profiles...

I never have to say “use the Oracle HCM skill.” I just describe my problem in natural language. The intent matching system routes it.

Practical Scenario: Configuring SEO and GEO for augmentedresilience.com

Before the site went live, I needed proper SEO and GEO — Open Graph tags for social sharing, meta descriptions for search, canonical URLs, a sitemap, and schema.org structured data so AI-powered search engines like Perplexity, ChatGPT, and Claude could understand and cite the content accurately. None of that comes configured out of the box with the re-terminal theme. I said:

“Set up SEO and apply Generative Engine Optimization to augmentedresilience.com.”

The system:

Activated the WebSavant skill (matched “SEO” + “GEO” + “site” intent — no skill named, no flags set)
Routed to the SEO and AddSchema workflows inside that skill
Created the correct Hugo partial override at layouts/partials/extended_head.html — the exact injection point the re-terminal theme exposes without touching any theme files
Added the full Open Graph tag set (og:title, og:description, og:image, og:url, og:type) wired to Hugo’s page variables
Injected schema.org JSON-LD for every page type: WebSite and Person on the homepage, Article and BreadcrumbList on every post, and AboutPage on /about — giving AI crawlers a machine-readable knowledge graph of the site
Created robots.txt explicitly permitting GPTBot, ClaudeBot, PerplexityBot, and other AI crawlers — with the sitemap URL wired in
Configured hugo.toml for canonical URL generation and enabled the built-in sitemap output

Without the skill, this is a day of Hugo documentation, schema.org spec-reading, and trial-and-error. With it, the full SEO and GEO stack was complete in a single pass — because the skill had already encoded where everything goes in Hugo’s directory structure, which schema types matter for which page contexts, and how to wire Hugo’s template variables into valid JSON-LD that AI search engines can actually parse.

The 34 Skills I Have Installed

My current skill roster includes tools for Oracle HCM support, security recon, OSINT research, browser automation, art generation, document processing, code generation, red teaming, and more. Each one is a packaged capability that activates without friction.

The system is also designed to be extended — building a new skill means writing a SKILL.md with a USE WHEN clause, a workflow routing table, and the workflow files. The CreateSkill skill handles the scaffolding automatically.

Pillar 3: Agents — Parallel Specialized Brains

Skills handle individual domain expertise. Agents handle scale and specialization when a task is too complex for a single pass or requires multiple perspectives simultaneously.

PAI has a three-tier agent system:

Tier 1: Task Tool Subagents (Internal Workhorses)

These are pre-built specialist agents that skills and workflows invoke internally: Engineer, Architect, Explore, QATester, Pentester, ClaudeResearcher, GeminiResearcher, GrokResearcher, and others.

When I ask PAI to research something deeply, it doesn’t just run one search. It can fan out to multiple research agents simultaneously — Claude, Gemini, and Grok each investigating from different angles — then synthesize the results with a “spotcheck” agent that verifies consistency.

This is parallel processing that would take me hours of manual work, running in minutes.

Tier 2: Named Agents (Persistent Specialists)

Named agents are recurring characters with rich backstories, persistent identities, and unique voices via ElevenLabs text-to-speech. They build relationship continuity across sessions.

My installed named agents include:

Serena Blackwood — Architect. Long-term system design decisions.
Marcus Webb — Engineer. Strategic technical leadership.
Rook Blackburn — Pentester. Security testing with a distinct personality.
Ava Sterling — Researcher (Claude). Strategic deep-dive analysis.
Alex Rivera — Researcher (Gemini). Multi-perspective comprehensive analysis.

When Rook runs a security assessment, he doesn’t just return findings — he announces them in his own voice through my speakers. It sounds minor. It’s not. Distinct voices make it cognitively easier to understand who did what work and why you should trust it.

Tier 3: Custom Agents (On-Demand Compositions)

For tasks that don’t fit a named agent, PAI can compose agents dynamically from trait combinations:

Expertise traits: security, legal, finance, medical, research, technical, creative
Personality traits: skeptical, enthusiastic, analytical, contrarian, meticulous
Approach traits: thorough, rapid, systematic, adversarial, synthesizing

Each unique trait combination maps to a different ElevenLabs voice. A security + adversarial agent gets Callum’s edgy voice. An analytical + meticulous agent gets Charlotte’s precise cadence.

The trait system means I can spin up a custom agent for any edge case without writing a new agent from scratch.

Practical Scenario: Pre-Launch Validation of Augmented Resilience

Before pushing the first real commit to augmentedresilience.com, I wasn’t going to just cross my fingers and run deploy.py. I asked PAI to validate the site was actually ready. What happened next wasn’t a single check — it was a parallel review board.

PAI spawned three agents simultaneously:

Rook Blackburn (Pentester) scanned the entire repo for credentials, API keys, and sensitive data I might have accidentally left in a config file or draft post — and announced his findings in his own voice through my speakers
A QA agent opened the Hugo local preview, walked every page, verified links weren’t broken, images loaded, and the deploy pipeline produced a clean public/ build
A Researcher agent audited the site’s meta tags, Open Graph data, and hugo.toml settings against SEO best practices for a new blog

A fourth spotcheck agent then reviewed all three outputs for conflicts — did Rook’s findings overlap with anything the QA agent flagged? Were there config issues that touched both SEO and security?

The result was a single consolidated pre-launch checklist. Two issues surfaced: a leftover draft post with personal notes still marked draft: false, and a missing og:image tag. Both fixed before the first visitor ever landed.

The site you’re reading right now went live clean because three agents checked it in parallel before I touched the deploy button. That’s the difference between asking a question and deploying intelligence.

When All Three Work Together

The real power of PAI isn’t any single pillar — it’s the composition.

Here’s what happened when I needed to go from “Obsidian draft” to “live on augmentedresilience.com” without a manual process I’d eventually forget or skip.

RAG assembled the context before I described the problem. From MEMORY it already knew: Namecheap shared hosting doesn’t support native git pull — content has to be pushed via FTP through GitHub Actions. It knew the Obsidian vault was the content source, that images.py had to run before hugo to convert image links, and that Python was the right tool for the orchestration script. Not one of those constraints was in my prompt.

Skills handled the architecture. WebSavant recognized a Hugo deployment pipeline request and routed to a workflow already aware of the full delivery chain: sync posts from Obsidian → convert images → hugo build → git commit → git push → GitHub Actions → Namecheap FTP. It knew the sequence. It knew why each step had to happen in that order.

Agents built it. An Engineer agent wrote deploy.py — the script that runs the whole sequence in a single command. An Architect agent designed the GitHub Actions workflow that picks up after the push and handles the Namecheap delivery step automatically. Two agents, two distinct responsibilities, running the job that a solo developer would have spent an afternoon piecing together from Stack Overflow answers.

The result:

python3 deploy.py "Add new post"

That’s it. One command. Every post that has ever gone live on this site — including this one — passed through a pipeline that RAG, Skills, and Agents built together. It’s not running because I set it up manually. It’s running because three systems knew what the job required before I finished explaining it.

The Compounding Effect

What makes this architecture meaningful over time isn’t any single interaction — it’s that the system gets better at helping you with every session.

The MEMORY system captures learnings. The SIGNALS directory tracks your implicit feedback. When something goes wrong, it logs the full context under LEARNING/FAILURES. When a workflow produces a 9-10 response, that signal is captured too. The system adjusts.

Generic AI starts fresh every time. PAI compounds.

I’m still early in this — the personal profile files still have template placeholders I haven’t filled in, and there are skills I’ve barely touched. But even at partial configuration, the system already thinks more like a senior colleague than a search engine.

That’s what RAG, Agents, and Skills make possible together: an AI that knows your context, activates the right expertise automatically, and scales parallel intelligence for complex work — all without you having to manage the machinery.

When Your PDF Workflow Breaks - Building a Markdown Converter with Claude Code

Wed, 18 Feb 2026 00:00:00 +0000

The Problem: PDFs Are Knowledge Prisons

You know that feeling when you download a brilliant research paper, only to realize you can’t easily feed it into your AI workflow? Or when you want to add documentation to your knowledge base, but it’s locked in a format that doesn’t play well with version control or LLM tools?

Yeah, I was there last week.

I had just downloaded a fascinating 1.3MB research paper on Generative Engine Optimization and wanted to process it with my AI tools. But PDFs are terrible for this. They’re designed for printing, not for processing. What I needed was Markdown—clean, portable, AI-friendly Markdown.

So I built a converter. And with Claude Code as my copilot through the PAI (Personal AI Infrastructure) system, the whole thing took less than 30 minutes.

Here’s how it went down.

Why Markdown is Better Than PDF for LLMs

Before diving into the build, let’s answer the obvious question: why bother converting? Can’t LLMs just read PDFs directly?

Technically, yes. But the results are significantly worse, and the reasons are fundamental to how PDFs work.

PDFs Are Layout-First, Not Structure-First

PDFs were designed to describe where things appear on a page, not what they mean. As Steven Howard explains in Why PDFs Fail Under LLM Parsing :

“Table cells with wrapped text insert hard line breaks that fragment token continuity and break logical row recognition. Headers and footers simply add noise to the context when used with LLMs. Sentences are split with arbitrary CR/LFs making it very difficult to find paragraph boundaries.”

This architectural mismatch — a format designed for printing being fed into a system designed for understanding — causes cascading problems downstream.

The Token Efficiency Problem

Every token your LLM processes costs money and consumes context window space. PDF extraction wastes both.

According to analysis from MarkdownConverters , Markdown saves up to 70% more tokens compared to extracted PDF text for the same content. The culprit: PDF extraction introduces formatting artifacts, metadata noise, headers/footers, and encoding remnants that all consume tokens without adding semantic value.

To put that in practical terms: a PDF that would use 10,000 tokens might only need 3,000 tokens when properly converted to Markdown. At scale, this compounds dramatically.

The RAG Performance Problem

If you’re building Retrieval Augmented Generation (RAG) systems — using documents as a knowledge base for AI — document format directly impacts answer quality.

The research here is compelling:

Academic validation: A 2024 paper on arXiv (Revolutionizing RAG with Enhanced PDF Structure Recognition ) found that “the low accuracy of PDF parsing significantly impacts the effectiveness of professional knowledge-based QA.”
Industry validation: NVIDIA’s technical blog documents how their NeMo Retriever pipeline converts extracted content to Markdown specifically because it “preserves row/column relationships in an LLM-native format, significantly reducing numeric hallucination” — and reduces incorrect answers by 50%. (NVIDIA: Approaches to PDF Data Extraction for Information Retrieval )
Chunking quality: Analysis from Towards Data Science shows that Markdown’s heading structure (#, ##, ###) produces semantically meaningful chunks, while PDF-based chunking relies on arbitrary page breaks and heuristics.
Retrieval failure rates: Unstructured.io’s research on contextual chunking — tested across 5,563 question-answer pairs — showed an 84% reduction in retrieval failure rates when using structure-aware chunking (the kind Markdown enables natively).
Real-world outcomes: The 2025 Semrush AI Index, cited by Webex Developers Blog , found that 72% of top AI-indexed articles used Markdown or Markdown-like structures, achieving 34% higher retrieval accuracy across ChatGPT, Perplexity, and Gemini.

The Bottom Line

Metric	Impact
Token reduction	Up to 70% fewer tokens vs PDF extraction
Incorrect answers in RAG	50% reduction (NVIDIA NeMo)
Retrieval failure rates	84% reduction (Unstructured.io)
Retrieval accuracy	34% higher (Semrush AI Index 2025)

Markdown isn’t just more convenient — it’s meaningfully better for AI. Converting your document libraries is one of the highest-ROI steps you can take before building any LLM-powered workflow.

The First Failure: When Bleeding-Edge Python Bites Back

I’m running Python 3.14.2—the latest release, barely a few weeks old. Modern, shiny, cutting-edge. Perfect, right?

Not quite.

My first instinct was to use marker-pdf, a high-performance converter optimized for scientific papers and books. It looked perfect on paper (pun intended). But when I tried to install it:

Building wheel for Pillow (pyproject.toml): finished with status 'error'

Ugh.

Turns out, marker-pdf depends on Pillow (the Python imaging library), and Pillow hasn’t built binary wheels for Python 3.14 yet. I could have downgraded Python. I could have fought with source compilation. But why?

This is where working with Claude Code really shines. Instead of going down a rabbit hole trying to force marker-pdf to work, Claude suggested pivoting to PyMuPDF4LLM—a mature, actively maintained library specifically designed for AI/LLM workflows.

And it just worked.

The Solution: PyMuPDF4LLM

PyMuPDF4LLM turned out to be exactly what I needed:

Works flawlessly with Python 3.14 (no compilation errors)
Fast and accurate conversion
Built specifically for feeding documents into LLMs
Clean, simple API
Actively maintained by the PyMuPDF team

The installation was literally:

pip install pymupdf4llm

Five seconds later, I was ready to go.

Building the Tool: First Principles Thinking

As someone new to the CLI world, I’ve been learning to think through project structure from first principles. Where should this live? How should it be organized?

With Claude’s guidance, I chose /Users/dsa/projects/pdf-to-markdown/ for a few key reasons:

Separation of Concerns: Tool projects should be separate from my main workspace
Discoverability: Clear, descriptive naming means I’ll find it again in 6 months
Reusability: This structure works both as a CLI tool AND as a library I could import later

The project structure ended up simple but complete:

pdf-to-markdown/
├── README.md # Documentation
├── venv/ # Isolated Python environment
├── input/ # Test PDFs
├── output/ # Generated markdown
├── pdf2md # CLI wrapper script
└── requirements.txt # Dependencies

The Code: A Simple but Powerful CLI

I wanted a tool I could actually use—something with a clean command-line interface that handles the common cases elegantly. Working with Claude through PAI, we created a Python script that does exactly that:

#!/usr/bin/env python3
"""
PDF to Markdown Converter
A simple CLI tool to convert PDF files to Markdown using PyMuPDF4LLM
"""

import sys
import os
from pathlib import Path
import pymupdf4llm
import pymupdf
from tqdm import tqdm

def convert_pdf_to_markdown(pdf_path: str, output_path: str = None) -> str:
 """Convert a PDF file to Markdown format."""

 if not os.path.exists(pdf_path):
 raise FileNotFoundError(f"PDF file not found: {pdf_path}")

 # Get page count for progress bar
 doc = pymupdf.open(pdf_path)
 page_count = doc.page_count
 doc.close()

 print(f"Converting: {pdf_path}")
 with tqdm(total=page_count, unit="page", desc="Processing", colour="blue") as bar:
 md_text = pymupdf4llm.to_markdown(pdf_path, page_chunks=False)
 bar.n = page_count
 bar.refresh()

 if output_path is None:
 output_path = Path(pdf_path).with_suffix('.md')

 with open(output_path, 'w', encoding='utf-8') as f:
 f.write(md_text)

 print(f"✓ Done: {output_path} ({len(md_text):,} characters)")
 return str(output_path)

def batch_convert(input_dir: str, output_dir: str = None) -> None:
 """Convert all PDFs in a directory to Markdown."""
 input_path = Path(input_dir)
 if not input_path.is_dir():
 raise NotADirectoryError(f"Not a directory: {input_dir}")

 pdfs = sorted(input_path.glob("*.pdf"))
 if not pdfs:
 print(f"No PDF files found in: {input_dir}")
 sys.exit(0)

 if output_dir:
 output_dir = Path(output_dir)
 else:
 output_dir = input_path.parent / "output"
 output_dir.mkdir(parents=True, exist_ok=True)

 total = len(pdfs)
 succeeded = 0
 failed = 0

 print(f"\nBatch mode: {total} PDF(s) found in '{input_dir}'")
 print(f"Output folder: {output_dir}\n")

 for i, pdf_path in enumerate(pdfs, start=1):
 print(f"[{i}/{total}] {pdf_path.name}")
 output_path = output_dir / pdf_path.with_suffix('.md').name
 try:
 convert_pdf_to_markdown(str(pdf_path), str(output_path))
 succeeded += 1
 except Exception as e:
 print(f" ✗ Failed: {e}")
 failed += 1
 print()

 print("─" * 40)
 print(f"Batch complete: {succeeded} converted, {failed} failed")
 print(f"Output folder: {output_dir}")

def main():
 """Main CLI entry point"""
 args = sys.argv[1:]

 if not args:
 print("Usage:")
 print(" pdf2md <input.pdf> [output.md] # Convert a single PDF")
 print(" pdf2md --batch <folder/> # Convert all PDFs in a folder")
 print(" pdf2md --batch <folder/> --output <out_folder/> # Batch with custom output dir")
 print("\nExamples:")
 print(" pdf2md document.pdf # Creates document.md")
 print(" pdf2md document.pdf custom.md # Creates custom.md")
 print(" pdf2md --batch input/ # Converts all PDFs in input/")
 print(" pdf2md --batch ~/documents/pdfs/ --output ~/knowledge-base/docs/")
 sys.exit(1)

 if args[0] == "--batch":
 input_dir = args[1]
 output_dir = None
 if "--output" in args:
 idx = args.index("--output")
 output_dir = args[idx + 1]
 batch_convert(input_dir, output_dir)
 else:
 pdf_path = args[0]
 output_path = args[1] if len(args) > 1 else None
 convert_pdf_to_markdown(pdf_path, output_path)

if __name__ == "__main__":
 main()

What I love about this code:

Smart defaults: If you don’t specify an output path, it just replaces .pdf with .md
Progress bars: tqdm gives you a blue progress bar with page count
Batch mode: --batch processes an entire folder at once, with optional --output target
Helpful errors: Clear messages when things go wrong
Flexible usage: Works with relative paths, absolute paths, custom output names

Make it executable:

chmod +x pdf2md

And now it’s a proper command-line tool.

The Moment of Truth: Testing with Real Data

Theory is great. But does it actually work?

I grabbed that 1.3MB research paper on Generative Engine Optimization and ran:

python pdf2md input/test.pdf output/test.md

The output:

Converting input/test.pdf to Markdown...
Processing: 100%|████████████████| 12/12 [00:02<00:00, 5.8 pages/s]
✓ Done: output/test.md (73,463 characters)

1.3MB PDF → 74KB of clean Markdown in seconds.

I opened the output file, and there it was—perfectly formatted markdown:

## **GEO: Generative Engine Optimization**

Pranjal Aggarwal [∗]
Indian Institute of Technology Delhi
New Delhi, India
pranjal2041@gmail.com

Ashwin Kalyan
Independent
Seattle, USA
asaavashwin@gmail.com
...

Headers, formatting, structure—all preserved. No manual cleanup needed.

Success.

What This Unlocks

Now that I have PDFs converting to Markdown reliably, a whole world of possibilities opens up:

AI Workflows

Feed research papers and documentation directly into Claude or other LLMs
Build RAG (Retrieval Augmented Generation) pipelines backed by your document library
Process technical documentation at scale without losing structure

Knowledge Management

Import PDFs into your Obsidian vault automatically
Version control document content (because it’s now plain text in git)
Full-text search across your entire converted document library

Automation Ideas

Watch folder that auto-converts any dropped PDFs
Batch process entire directories of reports, papers, or manuals
Feed converted markdown directly into a vector database
API wrapper to convert PDFs via HTTP requests

Lessons Learned (Especially for CLI Beginners)

1. Virtual Environments Are Non-Negotiable

Every Python project should live in its own virtual environment. Always:

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

This keeps dependencies isolated and projects reproducible.

2. Bleeding-Edge Isn’t Always Better

Python 3.14 is awesome, but sometimes mature tooling (like PyMuPDF) that “just works” beats bleeding-edge alternatives. Don’t be afraid to pivot when something doesn’t work.

3. Test With Real Data

I didn’t test with “hello.pdf” containing two sentences. I tested with a 1.3MB research paper. Real data reveals real issues (or in this case, confirms it works beautifully).

4. Document As You Build

Writing the README alongside the code made the project immediately understandable. Future-me will thank present-me.

5. Claude Code + PAI = Superpowers

Working with Claude through the PAI infrastructure meant I had a senior developer helping me think through:

Project structure (first principles)
Library selection (when to pivot)
Code organization (clean, maintainable)
Real-world usage patterns

This wasn’t just coding faster—it was learning better patterns while building.

Usage Examples

Basic Conversion

# Activate environment first (always!)
source venv/bin/activate

# Convert a PDF
python pdf2md document.pdf

# Custom output name
python pdf2md research.pdf my-notes.md

# Full paths
python pdf2md ~/Downloads/paper.pdf ~/Documents/notes.md

Batch Processing

Convert an entire folder of PDFs:

source venv/bin/activate

# Convert all PDFs in a folder (output goes to output/ by default)
python pdf2md --batch ~/documents/pdfs/

# Convert to a specific knowledge base directory
python pdf2md --batch ~/documents/pdfs/ --output ~/knowledge-base/docs/

Add to PATH (Optional)

To use pdf2md from anywhere:

# Add to ~/.zshrc
export PATH="/Users/dsa/projects/pdf-to-markdown:$PATH"

# Then run from anywhere
pdf2md ~/Downloads/paper.pdf ~/Documents/paper.md

What’s Next?

This tool works great as-is, but there are some exciting enhancements on the roadmap:

Immediate Improvements

Better layout analysis: Install pymupdf_layout for improved structure detection on complex documents
Recursive batch mode: Process nested folder structures, not just flat directories

Future Integrations

RAG pipeline: Auto-feed converted markdown into a vector database
Obsidian plugin: Detect PDFs in vault and convert automatically
FastAPI wrapper: Create an HTTP API for web apps to use
Electron/Tauri app: Build a desktop GUI for non-technical users

The Bigger Picture: Why This Matters

This project is tiny—roughly 100 lines of Python, 30 minutes of work. But it represents something bigger:

The ability to build tools that solve your actual problems.

I had a workflow friction (PDFs don’t work well with AI tools). I built a solution. Now that friction is gone, and I can focus on higher-level work.

And the data is clear: converting your document library to Markdown isn’t a nice-to-have. It’s a multiplier on every AI workflow that follows. Up to 70% fewer tokens consumed. 84% fewer retrieval failures. 50% fewer incorrect answers. These aren’t marginal improvements—they’re transformational.

Working with Claude Code through PAI accelerated all of this. It’s like having a patient senior developer sitting next to you, suggesting better approaches, catching errors before they happen, and explaining why certain patterns work.

Resources

PyMuPDF4LLM Docs: https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/
PyMuPDF GitHub: https://github.com/pymupdf/PyMuPDF

Citations: Markdown vs PDF for LLMs

Why PDFs Fail Under LLM Parsing — Steven Howard, Untethered AI: https://untetheredai.substack.com/p/why-pdfs-fail-under-llm-parsing
PDF vs Markdown for AI: Token Efficiency — MarkdownConverters: https://markdownconverters.com/blog/pdf-vs-markdown-ai-tokens
Revolutionizing RAG with Enhanced PDF Structure Recognition — arXiv:2401.12599 (2024): https://arxiv.org/abs/2401.12599
Approaches to PDF Data Extraction for Information Retrieval — NVIDIA Technical Blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/
Improved RAG Document Processing With Markdown — Dr. Leon Eversberg, Towards Data Science: https://medium.com/data-science/improved-rag-document-processing-with-markdown-426a2e0dd82b
Contextual Chunking: Boost Your RAG Retrieval Accuracy — Unstructured.io: https://unstructured.io/blog/contextual-chunking-in-unstructured-platform-boost-your-rag-retrieval-accuracy
Boosting AI Performance: The Power of LLM-Friendly Content in Markdown — Webex Developers Blog: https://developer.webex.com/blog/boosting-ai-performance-the-power-of-llm-friendly-content-in-markdown

Happy converting!

Deploying a Hugo Site to Namecheap with PAI

Sun, 15 Feb 2026 00:00:00 +0000

I recently deployed my Hugo blog to Namecheap shared hosting, using Obsidian as my content editor and Claude Code with PAI (Personal AI) as my copilot. Here’s a walkthrough of every step, from fixing build errors to setting up a fully automated pipeline that goes from Obsidian to live site in a single command.

The Starting Point

I created a Hugo blog project called Augmented Resilience and used the re-terminal theme, a Namecheap shared hosting account, and a GitHub repository. I used Claude Code in VS Code editor and leveraged Daniel Miessler’s Personal AI infrastructure. The goal: get the site live at augmentedresilience.com with a push-to-deploy workflow.

For context, the Personal AI Infrastructure System (PAI) from Daniel Miessler (see resources below) is an open-source framework that wraps around Claude Code and turns it into a structured problem-solving system. Instead of just chatting with an AI, PAI runs every request through a 7-phase algorithm — observe, think, plan, build, execute, verify, learn — so nothing gets skipped. It maintains persistent memory across sessions (so it remembers my project structure, preferences, and past decisions), automatically selects specialized agents for different tasks (security review, architecture, engineering), and enforces verification criteria before declaring anything “done.” For this project, PAI handled everything from debugging Hugo build errors to writing the deploy script to catching sensitive data I accidentally left in this blog post before it went live. It wasn’t just an AI assistant — it was the entire workflow engine. I found it easier to use it within VS Code (still getting used to using the command line interface).

Step 1: Fixing the Hugo Build

The first issue was a build error:

module "hugo-theme-re-terminal" not found

The problem was a mismatch between the theme name in hugo.toml and the actual directory name. The theme was installed as a git submodule at themes/re-terminal/, but the config referenced hugo-theme-re-terminal.

Fix: Change the theme name in hugo.toml:

theme = "re-terminal"

After that, hugo built the site successfully, generating the public/ folder with all the static files.

Step 2: Setting Up the GitHub Repository

I initialized the repo and connected it to GitHub:

git init
git remote add origin git@github.com:dsacosta/Augmented-Resilience.git
git add .
git commit -m "my first commit"
git push origin main

One gotcha: I initially typed orgin instead of origin in the remote add command. Typos happen — double-check your remote names with git remote -v.

Step 3: Connecting Namecheap to GitHub via SSH

This was the trickiest part. Namecheap shared hosting needs an SSH key to clone from a private GitHub repo. Here’s what worked:

Generate an SSH Key on Namecheap

Log into cPanel on Namecheap
Go to SSH Access → Manage SSH Keys → Generate a New Key
Generate an RSA key (I used the default settings)

Remove the Passphrase

This is critical. cPanel’s Git Version Control runs non-interactively, so it can’t prompt for a passphrase. I opened cPanel Terminal and ran:

ssh-keygen -p -f ~/.ssh/id_rsa

Enter the old passphrase, then press Enter twice for no new passphrase.

Add the Public Key to GitHub

On Namecheap’s cPanel Terminal, run: cat ~/.ssh/id_rsa.pub
Copy the output
Go to your GitHub repo → Settings → Deploy Keys → Add deploy key
Paste the public key and save

Verify the Connection

From cPanel Terminal:

ssh -T git@github.com

You should see: Hi dsacosta/Augmented-Resilience! You've successfully authenticated...

Step 4: Clone the Repo on Namecheap

In cPanel, go to Git Version Control → Create
Toggle Clone a Repository on
Enter the clone URL: git@github.com:dsacosta/Augmented-Resilience.git
Set the repository path (I used /home/yourusername/your-repo)
Click Create

Important: Don’t clone directly into public_html or your domain folder — it likely already has files and will error out. Clone to a separate directory and use deployment to copy files over.

Step 5: Auto-Deployment with .cpanel.yml

cPanel supports automatic deployment tasks via a .cpanel.yml file in the repo root. This file tells cPanel what to do after each pull:

---
deployment:
 tasks:
 - export DEPLOYPATH=/home/yourusername/yourdomain.com/
 - /bin/cp -R public/* $DEPLOYPATH

This copies everything from the public/ folder (Hugo’s build output) into the live site directory.

After pushing this file to GitHub:

Go to Git Version Control → Manage your repo
Click the Pull or Deploy tab
Click Update from Remote to pull the latest
Click Deploy HEAD Commit to trigger the .cpanel.yml tasks

Your site should now be live.

Step 6: Fully Automated Deploys with GitHub Actions

To eliminate the manual “pull and deploy” step in cPanel, I set up a GitHub Actions workflow that SSHs into Namecheap and triggers the pull automatically on every push.

Generate a Deploy Key

On your local machine, generate a key pair with no passphrase:

ssh-keygen -t ed25519 -C "github-actions-deploy" -f ~/.ssh/deploy_key -N ""

Add the public key to Namecheap:

# In cPanel Terminal on Namecheap:
echo "ssh-ed25519 AAAA...your-key-here github-actions-deploy" >> ~/.ssh/authorized_keys

Add Secrets to GitHub

Go to your repo → Settings → Secrets and variables → Actions and add:

Secret	Value
`NC_HOST`	`augmentedresilience.com`
`NC_USER`	Your cPanel username
`NC_PORT`	Your SSH port (check cPanel)
`NC_SSH_KEY`	The full private key (including BEGIN/END lines)

Create the Workflow

Add .github/workflows/deploy.yml to your repo:

name: Deploy to Namecheap

on:
 push:
 branches: [main]

jobs:
 deploy:
 runs-on: ubuntu-latest
 steps:
 - name: Deploy via SSH
 uses: appleboy/ssh-action@v1
 with:
 host: ${{ secrets.NC_HOST }}
 username: ${{ secrets.NC_USER }}
 key: ${{ secrets.NC_SSH_KEY }}
 port: ${{ secrets.NC_PORT }}
 script: |
 cd ~/your-repo && git pull origin main && /bin/cp -R public/* ~/yourdomain.com/

Now every push to main automatically deploys to your live site.

Step 7: One-Command Deploy Script

Five manual commands every time you publish? That’s not a workflow — that’s a chore. I had Claude write a Python script that handles everything in one shot:

#!/usr/bin/env python3
"""One-command deploy: Obsidian → Hugo → GitHub → Live site."""

import subprocess
import sys
from datetime import datetime

PROJECT_DIR = "~/Documents/Augmented-Resilience"
OBSIDIAN_POSTS = "~/projects/obsidian-vault/30-projects/augmented-resilience-posts"
HUGO_POSTS = f"{PROJECT_DIR}/content/posts"


def run(cmd, description, cwd=PROJECT_DIR):
 """Run a command and print status."""
 print(f"\n{'='*50}")
 print(f" {description}")
 print(f"{'='*50}")
 result = subprocess.run(cmd, shell=True, cwd=cwd)
 if result.returncode != 0:
 print(f"\n FAILED: {description}")
 sys.exit(1)
 return result


def main():
 msg = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else f"Site update {datetime.now().strftime('%Y-%m-%d %H:%M')}"

 run(f'rsync -av --delete "{OBSIDIAN_POSTS}" "{HUGO_POSTS}"', "Syncing posts from Obsidian")
 run(f"python3 {PROJECT_DIR}/images.py", "Processing images")
 run("hugo", "Building site with Hugo")
 run("git add .", "Staging changes")

 result = subprocess.run("git diff --cached --quiet", shell=True, cwd=PROJECT_DIR)
 if result.returncode == 0:
 print("\n No changes to commit. Site is up to date.")
 return

 run(f'git commit -m "{msg}"', "Committing")
 run("git push origin main", "Pushing to GitHub")

 print(f"\n{'='*50}")
 print(" DEPLOYED! Your site will be live shortly.")
 print(f"{'='*50}\n")


if __name__ == "__main__":
 main()

Save this as deploy.py in your project root. Now the entire workflow is:

# Default timestamped commit message
python3 deploy.py

# Or with a custom message
python3 deploy.py "Add new blog post about deployment"

The script runs every step in sequence — syncs from Obsidian, converts image links, builds with Hugo, commits, and pushes. If any step fails, it stops immediately so you don’t push a broken build. Combined with the GitHub Actions workflow from Step 6, pushing triggers the auto-deploy to Namecheap. One command, fully live.

Lessons Learned

SSH passphrases break cPanel automation. Always remove the passphrase from keys used by cPanel’s Git Version Control.
Theme names must match directory names. Hugo looks for the theme in themes/<theme-name>/, so the theme value in your config must match exactly.
Don’t clone into the live site directory. Clone to a separate folder and use .cpanel.yml to copy the built files over.
GitHub Actions + SSH is the cleanest auto-deploy for shared hosting. No webhooks, no cron jobs — just a simple SSH action that runs on every push.
Claude Code with PAI made this possible in a single session. From debugging build errors to SSH key troubleshooting to writing GitHub Actions workflows, having an AI pair programmer turned what could have been hours of Stack Overflow rabbit holes into a smooth, guided process.

Tools Used

Hugo — Static site generator
re-terminal theme — Hugo theme
Namecheap — Shared hosting with cPanel
GitHub Actions — CI/CD automation
Claude Code — AI pair programmer
Personal AI Infrastructure (PAI) — Workflow engine
Obsidian — Content authoring
VS Code - Development environment; used as the workspace for Claude Code sessions

Resources

I started a blog…..in 2024 (why you should too) — The YouTube video that inspired me to start this blog
Building a Personal AI Infrastructure (PAI) — Daniel Miessler’s guide to building your own Personal AI system