Cli on

Stop Prompt Engineering. Start Building Infrastructure.

Sun, 19 Apr 2026 00:00:00 +0000

Stop Prompt Engineering. Start Building Infrastructure.

Last week I opened a terminal, typed six words, and watched PAI spend the next three minutes processing a set of handwritten study notes exported from my reMarkable tablet. It converted the file format, extracted key concepts, generated structured review questions, cross-referenced my existing knowledge base, and saved everything to the correct directories in Obsidian — organized by module, tagged correctly, ready to use. I did not write a prompt. I did not explain what certification I was studying. I did not describe the output structure I wanted. I just named the module.

Eighteen months ago, that same task would have started with a paragraph explaining what the reMarkable export format was, what the certification covered, how I organized notes in Obsidian, what level of detail I wanted in the summary, and what format the quiz questions should follow — and another paragraph if I wanted the output saved to a specific location. Every single session. From scratch.

That gap is the entire argument for building an AI harness instead of staying in a chat window.

The Chat Window Tax

Prompt engineering emerged as a discipline because LLMs are stateless by default. Every conversation starts with a blank model. If you want the model to know who you are, what you work on, how you like your outputs formatted, and which approach you prefer for recurring problems — you have to tell it. Every time.

That is a tax. Not a feature. A tax.

The people who got good at prompt engineering got skilled at paying that tax efficiently — writing shorter context dumps, using system prompts in API playgrounds, building prompt libraries they paste from. It helped. But it never made the tax go away. It just made each payment slightly cheaper.

In 2026, paying that tax is a choice. The tools exist to stop paying it entirely.

What a Harness Actually Does

A harness is infrastructure wrapped around your AI runtime. In my case, that is PAI — Personal AI Infrastructure — running on top of Claude Code in the terminal. The architecture has three layers.

Memory is persistent context that survives across sessions. PAI knows my role (HRIS analyst), my platform (Oracle HCM Cloud), my Oracle triage methodology, my blog’s writing conventions, my active projects, and my preferences for output formats. None of that gets re-entered. It gets loaded automatically at session start.

Skills are pre-built, parameterized workflows. When I say “process my study notes,” a skill handles that — reading from the right directory, converting the format, saving to the right Obsidian path, cross-referencing the knowledge base. The skill is the prompt, written once, tested, improved over time. I do not craft it fresh every time.

The Algorithm is a structured execution framework. When the work is complex — multi-step, multi-file, non-trivial — PAI runs through a defined process: observe, think, plan, build, execute, verify, learn. The output is consistent because the process is consistent.

Taken together, these three things mean the model is never starting from zero. It arrives at each session already oriented.

The Token Economy Hidden Inside the Infrastructure

There is a practical angle to this that does not get talked about enough: token consumption.

Every message in a chat session burns tokens — your context, the model’s reasoning, the output, and whatever you paste in to re-establish state. The longer and more complex the session, the faster you burn toward usage limits. When you are re-explaining your role, your project, and your preferences at the start of each conversation, you are spending tokens on re-orientation, not on actual work.

A harness changes the math.

PAI loads persistent context at session start through hooks — but those are structured files read by the runtime, not large prompt blocks the model has to reason through. The model arrives oriented. The working token budget goes toward the task.

More importantly, PAI externalizes logic that would otherwise live inside the conversation. The skills are pre-written workflows. The Algorithm is a structured execution framework. The session hooks handle routing and context injection. A significant portion of what would normally require the model to think its way through — “what directory does this go in?”, “what format does this certification use?”, “what’s the right next step in this process?” — is already answered in scripts and configuration files that run before the model responds.

That is not just more efficient. It changes your usage ceiling. When the model is not spending context budget on re-orientation or derivable decisions, more of each session goes toward meaningful work. You hit limits later, do more per session, and run longer chains of complex tasks without interruption.

Prompt engineering optimizes the prompt. Infrastructure optimizes the budget.

CLI vs. Chat: It Is Architecture, Not Preference

This is the part that took me a while to articulate. The preference for CLI over chat window is not aesthetic — it is structural.

A chat window is a conversation interface. Conversations are ephemeral. They have no persistent state, no programmable hooks, no way to inject context at session start, no way to trigger workflows, no way to store outputs in structured memory. The UX is polished. The architecture is a dead end for anything requiring continuity.

A CLI is a programmable runtime. Session start hooks can load context files. Commands can trigger skills. Outputs can write back to memory. Different agents can be spawned with different contexts and run in parallel. The AI operates inside an environment you built, not inside a box you are renting.

That difference compounds. A chat window is equally capable on day one and day three hundred. A harness gets more capable every time you add a skill, improve the memory, or refine the algorithm.

Before and After: The Same Problem, Two Environments

Chat window, eight months ago:

“I have study notes from a certification I’m working through, exported as a Word document from my tablet. I organize my notes in Obsidian under a folder structure by certification and module number. I need you to convert the content to clean markdown, extract the key concepts as a structured summary, generate quiz questions with answers, and format everything to match my existing note structure. The certification is [name], this is module [N], and here’s an example of how my other notes look: [paste example]…”

Then the session ended. Next time I had notes to process — same context dump, from scratch.

With PAI, today:

“Process my study notes for module 4.”

PAI already knows the certification, the Obsidian directory structure, the naming conventions, the quiz format, and which knowledge base to cross-reference. Processing starts immediately. The notes land in the right place in the right format.

The eight-month gap between those two experiences is not better prompting. It is infrastructure.

2026: Where the Power Users Went

The practitioners who were deep into prompt engineering two years ago have largely moved on — not to better prompts, but to better systems. They are building skills, writing memory schemas, wiring session hooks, running structured execution algorithms on complex work. The prompt engineer persona is being quietly replaced by the AI infrastructure builder.

This is not about being technical. It is about thinking one level up. Instead of asking how to get a better response to this prompt, you ask what a system would need to know to handle this reliably, every time.

Your Knowledge Doesn’t Live in the Model

One of the less obvious benefits of building infrastructure rather than relying on chat conversations: your knowledge is not locked to any LLM.

When everything lives in a chat window, switching models means starting over. Your context, your conversation history, your accumulated session knowledge — gone. The model you were using knew who you were because you kept telling it. A different model knows nothing.

With PAI, the knowledge lives in files you own. The memory is markdown on your machine. The skills are scripts in a directory. The algorithm is a structured process your runtime executes. None of it is stored inside Claude, or any other model. The AI is the engine, not the warehouse.

That distinction matters more than it sounds. LLMs are evolving fast. A model that is the best choice today may not be the best choice in six months. If your entire working context is entangled with one provider’s chat history, migration is painful. If your context lives in a portable, file-based system, switching the underlying model is a configuration change — not a rebuild.

I run PAI on Claude today because it is the best fit for how I work right now. But the memory schema, the skill library, the algorithm — all of it would transfer to a different model without losing a session’s worth of context. That portability is a deliberate design choice, and it is one of the most underappreciated properties of building on open infrastructure rather than inside a walled chat product.

Credit Where It’s Due

PAI did not emerge from a vacuum. A significant part of the thinking behind it — the idea that AI should be augmenting structured, intentional human systems rather than replacing ad-hoc conversations — traces directly to the work of Daniel Miessler .

Daniel has been articulating the case for AI infrastructure thinking longer than most. His Fabric project, his writing on augmented intelligence, and his broader framing of what it means to build systems that extend human capability rather than just answer questions — all of it shaped how PAI was conceived and how it continues to evolve.

The shift from “better prompts” to “better systems” is not a new idea. It just needed enough tooling to become practical. Daniel saw that early.

Where to Start

PAI is open-source. Claude Code is free to start — it is Anthropic’s official CLI, available to any Claude user. The distance between using AI in a chat window and running it inside a harness is smaller than it looks, and the compounding return starts from the first session where PAI remembers something you did not have to re-enter.

If you are still re-explaining yourself every time you open a new tab, that is the problem worth solving.

When Your PDF Workflow Breaks - Building a Markdown Converter with Claude Code

Wed, 18 Feb 2026 00:00:00 +0000

The Problem: PDFs Are Knowledge Prisons

You know that feeling when you download a brilliant research paper, only to realize you can’t easily feed it into your AI workflow? Or when you want to add documentation to your knowledge base, but it’s locked in a format that doesn’t play well with version control or LLM tools?

Yeah, I was there last week.

I had just downloaded a fascinating 1.3MB research paper on Generative Engine Optimization and wanted to process it with my AI tools. But PDFs are terrible for this. They’re designed for printing, not for processing. What I needed was Markdown—clean, portable, AI-friendly Markdown.

So I built a converter. And with Claude Code as my copilot through the PAI (Personal AI Infrastructure) system, the whole thing took less than 30 minutes.

Here’s how it went down.

Why Markdown is Better Than PDF for LLMs

Before diving into the build, let’s answer the obvious question: why bother converting? Can’t LLMs just read PDFs directly?

Technically, yes. But the results are significantly worse, and the reasons are fundamental to how PDFs work.

PDFs Are Layout-First, Not Structure-First

PDFs were designed to describe where things appear on a page, not what they mean. As Steven Howard explains in Why PDFs Fail Under LLM Parsing :

“Table cells with wrapped text insert hard line breaks that fragment token continuity and break logical row recognition. Headers and footers simply add noise to the context when used with LLMs. Sentences are split with arbitrary CR/LFs making it very difficult to find paragraph boundaries.”

This architectural mismatch — a format designed for printing being fed into a system designed for understanding — causes cascading problems downstream.

The Token Efficiency Problem

Every token your LLM processes costs money and consumes context window space. PDF extraction wastes both.

According to analysis from MarkdownConverters , Markdown saves up to 70% more tokens compared to extracted PDF text for the same content. The culprit: PDF extraction introduces formatting artifacts, metadata noise, headers/footers, and encoding remnants that all consume tokens without adding semantic value.

To put that in practical terms: a PDF that would use 10,000 tokens might only need 3,000 tokens when properly converted to Markdown. At scale, this compounds dramatically.

The RAG Performance Problem

If you’re building Retrieval Augmented Generation (RAG) systems — using documents as a knowledge base for AI — document format directly impacts answer quality.

The research here is compelling:

Academic validation: A 2024 paper on arXiv (Revolutionizing RAG with Enhanced PDF Structure Recognition ) found that “the low accuracy of PDF parsing significantly impacts the effectiveness of professional knowledge-based QA.”
Industry validation: NVIDIA’s technical blog documents how their NeMo Retriever pipeline converts extracted content to Markdown specifically because it “preserves row/column relationships in an LLM-native format, significantly reducing numeric hallucination” — and reduces incorrect answers by 50%. (NVIDIA: Approaches to PDF Data Extraction for Information Retrieval )
Chunking quality: Analysis from Towards Data Science shows that Markdown’s heading structure (#, ##, ###) produces semantically meaningful chunks, while PDF-based chunking relies on arbitrary page breaks and heuristics.
Retrieval failure rates: Unstructured.io’s research on contextual chunking — tested across 5,563 question-answer pairs — showed an 84% reduction in retrieval failure rates when using structure-aware chunking (the kind Markdown enables natively).
Real-world outcomes: The 2025 Semrush AI Index, cited by Webex Developers Blog , found that 72% of top AI-indexed articles used Markdown or Markdown-like structures, achieving 34% higher retrieval accuracy across ChatGPT, Perplexity, and Gemini.

The Bottom Line

Metric	Impact
Token reduction	Up to 70% fewer tokens vs PDF extraction
Incorrect answers in RAG	50% reduction (NVIDIA NeMo)
Retrieval failure rates	84% reduction (Unstructured.io)
Retrieval accuracy	34% higher (Semrush AI Index 2025)

Markdown isn’t just more convenient — it’s meaningfully better for AI. Converting your document libraries is one of the highest-ROI steps you can take before building any LLM-powered workflow.

The First Failure: When Bleeding-Edge Python Bites Back

I’m running Python 3.14.2—the latest release, barely a few weeks old. Modern, shiny, cutting-edge. Perfect, right?

Not quite.

My first instinct was to use marker-pdf, a high-performance converter optimized for scientific papers and books. It looked perfect on paper (pun intended). But when I tried to install it:

Building wheel for Pillow (pyproject.toml): finished with status 'error'

Ugh.

Turns out, marker-pdf depends on Pillow (the Python imaging library), and Pillow hasn’t built binary wheels for Python 3.14 yet. I could have downgraded Python. I could have fought with source compilation. But why?

This is where working with Claude Code really shines. Instead of going down a rabbit hole trying to force marker-pdf to work, Claude suggested pivoting to PyMuPDF4LLM—a mature, actively maintained library specifically designed for AI/LLM workflows.

And it just worked.

The Solution: PyMuPDF4LLM

PyMuPDF4LLM turned out to be exactly what I needed:

Works flawlessly with Python 3.14 (no compilation errors)
Fast and accurate conversion
Built specifically for feeding documents into LLMs
Clean, simple API
Actively maintained by the PyMuPDF team

The installation was literally:

pip install pymupdf4llm

Five seconds later, I was ready to go.

Building the Tool: First Principles Thinking

As someone new to the CLI world, I’ve been learning to think through project structure from first principles. Where should this live? How should it be organized?

With Claude’s guidance, I chose /Users/dsa/projects/pdf-to-markdown/ for a few key reasons:

Separation of Concerns: Tool projects should be separate from my main workspace
Discoverability: Clear, descriptive naming means I’ll find it again in 6 months
Reusability: This structure works both as a CLI tool AND as a library I could import later

The project structure ended up simple but complete:

pdf-to-markdown/
├── README.md # Documentation
├── venv/ # Isolated Python environment
├── input/ # Test PDFs
├── output/ # Generated markdown
├── pdf2md # CLI wrapper script
└── requirements.txt # Dependencies

The Code: A Simple but Powerful CLI

I wanted a tool I could actually use—something with a clean command-line interface that handles the common cases elegantly. Working with Claude through PAI, we created a Python script that does exactly that:

#!/usr/bin/env python3
"""
PDF to Markdown Converter
A simple CLI tool to convert PDF files to Markdown using PyMuPDF4LLM
"""

import sys
import os
from pathlib import Path
import pymupdf4llm
import pymupdf
from tqdm import tqdm

def convert_pdf_to_markdown(pdf_path: str, output_path: str = None) -> str:
 """Convert a PDF file to Markdown format."""

 if not os.path.exists(pdf_path):
 raise FileNotFoundError(f"PDF file not found: {pdf_path}")

 # Get page count for progress bar
 doc = pymupdf.open(pdf_path)
 page_count = doc.page_count
 doc.close()

 print(f"Converting: {pdf_path}")
 with tqdm(total=page_count, unit="page", desc="Processing", colour="blue") as bar:
 md_text = pymupdf4llm.to_markdown(pdf_path, page_chunks=False)
 bar.n = page_count
 bar.refresh()

 if output_path is None:
 output_path = Path(pdf_path).with_suffix('.md')

 with open(output_path, 'w', encoding='utf-8') as f:
 f.write(md_text)

 print(f"✓ Done: {output_path} ({len(md_text):,} characters)")
 return str(output_path)

def batch_convert(input_dir: str, output_dir: str = None) -> None:
 """Convert all PDFs in a directory to Markdown."""
 input_path = Path(input_dir)
 if not input_path.is_dir():
 raise NotADirectoryError(f"Not a directory: {input_dir}")

 pdfs = sorted(input_path.glob("*.pdf"))
 if not pdfs:
 print(f"No PDF files found in: {input_dir}")
 sys.exit(0)

 if output_dir:
 output_dir = Path(output_dir)
 else:
 output_dir = input_path.parent / "output"
 output_dir.mkdir(parents=True, exist_ok=True)

 total = len(pdfs)
 succeeded = 0
 failed = 0

 print(f"\nBatch mode: {total} PDF(s) found in '{input_dir}'")
 print(f"Output folder: {output_dir}\n")

 for i, pdf_path in enumerate(pdfs, start=1):
 print(f"[{i}/{total}] {pdf_path.name}")
 output_path = output_dir / pdf_path.with_suffix('.md').name
 try:
 convert_pdf_to_markdown(str(pdf_path), str(output_path))
 succeeded += 1
 except Exception as e:
 print(f" ✗ Failed: {e}")
 failed += 1
 print()

 print("─" * 40)
 print(f"Batch complete: {succeeded} converted, {failed} failed")
 print(f"Output folder: {output_dir}")

def main():
 """Main CLI entry point"""
 args = sys.argv[1:]

 if not args:
 print("Usage:")
 print(" pdf2md <input.pdf> [output.md] # Convert a single PDF")
 print(" pdf2md --batch <folder/> # Convert all PDFs in a folder")
 print(" pdf2md --batch <folder/> --output <out_folder/> # Batch with custom output dir")
 print("\nExamples:")
 print(" pdf2md document.pdf # Creates document.md")
 print(" pdf2md document.pdf custom.md # Creates custom.md")
 print(" pdf2md --batch input/ # Converts all PDFs in input/")
 print(" pdf2md --batch ~/documents/pdfs/ --output ~/knowledge-base/docs/")
 sys.exit(1)

 if args[0] == "--batch":
 input_dir = args[1]
 output_dir = None
 if "--output" in args:
 idx = args.index("--output")
 output_dir = args[idx + 1]
 batch_convert(input_dir, output_dir)
 else:
 pdf_path = args[0]
 output_path = args[1] if len(args) > 1 else None
 convert_pdf_to_markdown(pdf_path, output_path)

if __name__ == "__main__":
 main()

What I love about this code:

Smart defaults: If you don’t specify an output path, it just replaces .pdf with .md
Progress bars: tqdm gives you a blue progress bar with page count
Batch mode: --batch processes an entire folder at once, with optional --output target
Helpful errors: Clear messages when things go wrong
Flexible usage: Works with relative paths, absolute paths, custom output names

Make it executable:

chmod +x pdf2md

And now it’s a proper command-line tool.

The Moment of Truth: Testing with Real Data

Theory is great. But does it actually work?

I grabbed that 1.3MB research paper on Generative Engine Optimization and ran:

python pdf2md input/test.pdf output/test.md

The output:

Converting input/test.pdf to Markdown...
Processing: 100%|████████████████| 12/12 [00:02<00:00, 5.8 pages/s]
✓ Done: output/test.md (73,463 characters)

1.3MB PDF → 74KB of clean Markdown in seconds.

I opened the output file, and there it was—perfectly formatted markdown:

## **GEO: Generative Engine Optimization**

Pranjal Aggarwal [∗]
Indian Institute of Technology Delhi
New Delhi, India
pranjal2041@gmail.com

Ashwin Kalyan
Independent
Seattle, USA
asaavashwin@gmail.com
...

Headers, formatting, structure—all preserved. No manual cleanup needed.

Success.

What This Unlocks

Now that I have PDFs converting to Markdown reliably, a whole world of possibilities opens up:

AI Workflows

Feed research papers and documentation directly into Claude or other LLMs
Build RAG (Retrieval Augmented Generation) pipelines backed by your document library
Process technical documentation at scale without losing structure

Knowledge Management

Import PDFs into your Obsidian vault automatically
Version control document content (because it’s now plain text in git)
Full-text search across your entire converted document library

Automation Ideas

Watch folder that auto-converts any dropped PDFs
Batch process entire directories of reports, papers, or manuals
Feed converted markdown directly into a vector database
API wrapper to convert PDFs via HTTP requests

Lessons Learned (Especially for CLI Beginners)

1. Virtual Environments Are Non-Negotiable

Every Python project should live in its own virtual environment. Always:

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

This keeps dependencies isolated and projects reproducible.

2. Bleeding-Edge Isn’t Always Better

Python 3.14 is awesome, but sometimes mature tooling (like PyMuPDF) that “just works” beats bleeding-edge alternatives. Don’t be afraid to pivot when something doesn’t work.

3. Test With Real Data

I didn’t test with “hello.pdf” containing two sentences. I tested with a 1.3MB research paper. Real data reveals real issues (or in this case, confirms it works beautifully).

4. Document As You Build

Writing the README alongside the code made the project immediately understandable. Future-me will thank present-me.

5. Claude Code + PAI = Superpowers

Working with Claude through the PAI infrastructure meant I had a senior developer helping me think through:

Project structure (first principles)
Library selection (when to pivot)
Code organization (clean, maintainable)
Real-world usage patterns

This wasn’t just coding faster—it was learning better patterns while building.

Usage Examples

Basic Conversion

# Activate environment first (always!)
source venv/bin/activate

# Convert a PDF
python pdf2md document.pdf

# Custom output name
python pdf2md research.pdf my-notes.md

# Full paths
python pdf2md ~/Downloads/paper.pdf ~/Documents/notes.md

Batch Processing

Convert an entire folder of PDFs:

source venv/bin/activate

# Convert all PDFs in a folder (output goes to output/ by default)
python pdf2md --batch ~/documents/pdfs/

# Convert to a specific knowledge base directory
python pdf2md --batch ~/documents/pdfs/ --output ~/knowledge-base/docs/

Add to PATH (Optional)

To use pdf2md from anywhere:

# Add to ~/.zshrc
export PATH="/Users/dsa/projects/pdf-to-markdown:$PATH"

# Then run from anywhere
pdf2md ~/Downloads/paper.pdf ~/Documents/paper.md

What’s Next?

This tool works great as-is, but there are some exciting enhancements on the roadmap:

Immediate Improvements

Better layout analysis: Install pymupdf_layout for improved structure detection on complex documents
Recursive batch mode: Process nested folder structures, not just flat directories

Future Integrations

RAG pipeline: Auto-feed converted markdown into a vector database
Obsidian plugin: Detect PDFs in vault and convert automatically
FastAPI wrapper: Create an HTTP API for web apps to use
Electron/Tauri app: Build a desktop GUI for non-technical users

The Bigger Picture: Why This Matters

This project is tiny—roughly 100 lines of Python, 30 minutes of work. But it represents something bigger:

The ability to build tools that solve your actual problems.

I had a workflow friction (PDFs don’t work well with AI tools). I built a solution. Now that friction is gone, and I can focus on higher-level work.

And the data is clear: converting your document library to Markdown isn’t a nice-to-have. It’s a multiplier on every AI workflow that follows. Up to 70% fewer tokens consumed. 84% fewer retrieval failures. 50% fewer incorrect answers. These aren’t marginal improvements—they’re transformational.

Working with Claude Code through PAI accelerated all of this. It’s like having a patient senior developer sitting next to you, suggesting better approaches, catching errors before they happen, and explaining why certain patterns work.

Resources

PyMuPDF4LLM Docs: https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/
PyMuPDF GitHub: https://github.com/pymupdf/PyMuPDF

Citations: Markdown vs PDF for LLMs

Why PDFs Fail Under LLM Parsing — Steven Howard, Untethered AI: https://untetheredai.substack.com/p/why-pdfs-fail-under-llm-parsing
PDF vs Markdown for AI: Token Efficiency — MarkdownConverters: https://markdownconverters.com/blog/pdf-vs-markdown-ai-tokens
Revolutionizing RAG with Enhanced PDF Structure Recognition — arXiv:2401.12599 (2024): https://arxiv.org/abs/2401.12599
Approaches to PDF Data Extraction for Information Retrieval — NVIDIA Technical Blog: https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/
Improved RAG Document Processing With Markdown — Dr. Leon Eversberg, Towards Data Science: https://medium.com/data-science/improved-rag-document-processing-with-markdown-426a2e0dd82b
Contextual Chunking: Boost Your RAG Retrieval Accuracy — Unstructured.io: https://unstructured.io/blog/contextual-chunking-in-unstructured-platform-boost-your-rag-retrieval-accuracy
Boosting AI Performance: The Power of LLM-Friendly Content in Markdown — Webex Developers Blog: https://developer.webex.com/blog/boosting-ai-performance-the-power-of-llm-friendly-content-in-markdown

Happy converting!