<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Python on</title><link>https://augmentedresilience.com/tags/python/</link><description>Recent content in Python on</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sat, 02 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://augmentedresilience.com/tags/python/index.xml" rel="self" type="application/rss+xml"/><item><title>Upgrading My PDF Converter to IBM's Docling</title><link>https://augmentedresilience.com/posts/augmented-resilience-posts/upgrading-my-pdf-converter-to-ibm-docling/</link><pubDate>Sat, 02 May 2026 00:00:00 +0000</pubDate><guid>https://augmentedresilience.com/posts/augmented-resilience-posts/upgrading-my-pdf-converter-to-ibm-docling/</guid><description>&lt;h2 id="when-my-own-tool-couldnt-handle-my-work">When My Own Tool Couldn&amp;rsquo;t Handle My Work&lt;/h2>
&lt;p>The error message was easy to dismiss: &lt;code>RapidOCR returned empty result!&lt;/code>. It appeared twice in the terminal, then silence — a blank .md file where a 40-page Oracle HCM implementation guide should have been. The PDF had come straight from Oracle&amp;rsquo;s support portal, the same format I use for every triage session. But this one stored its pages as images, and PyMuPDF4LLM had nothing to work with.&lt;/p></description><content>&lt;h2 id="when-my-own-tool-couldnt-handle-my-work">When My Own Tool Couldn&amp;rsquo;t Handle My Work&lt;/h2>
&lt;p>The error message was easy to dismiss: &lt;code>RapidOCR returned empty result!&lt;/code>. It appeared twice in the terminal, then silence — a blank .md file where a 40-page Oracle HCM implementation guide should have been. The PDF had come straight from Oracle&amp;rsquo;s support portal, the same format I use for every triage session. But this one stored its pages as images, and PyMuPDF4LLM had nothing to work with.&lt;/p>
&lt;p>That was one category of failure. The other was quieter. For documents that did convert, I started noticing the tables were wrong — not corrupted, just structurally dissolved. An eligibility matrix that should have had six clearly labeled columns came back as a run of loosely connected text. Useful for nothing.&lt;/p>
&lt;p>I had built this tool to serve my Oracle work. Then my Oracle work showed me exactly where it fell short.&lt;/p>
&lt;hr>
&lt;h2 id="the-problem-with-pymupdf4llm">The Problem with PyMuPDF4LLM&lt;/h2>
&lt;p>If you&amp;rsquo;ve followed this series, you know that PyMuPDF4LLM was a solid choice when I first &lt;a href="https://augmentedresilience.com/posts/when-your-pdf-workflow-breaks-building-a-markdown-converter-with-claude-code/" target="_blank" rel="noopener noreferrer">built the converter&lt;/a>
. It handled text-based PDFs cleanly, installed without friction, and required almost no configuration. For research papers and simple documentation, it worked well.&lt;/p>
&lt;p>But Oracle HCM documentation is a different category of document. Oracle&amp;rsquo;s guides are dense with tables: configuration reference grids, eligibility matrices, step-and-action setup tables. These are not decorative — they carry most of the meaning. When PyMuPDF4LLM dissolved those tables into unstructured text, it was silently degrading the most important parts of the document.&lt;/p>
&lt;p>The image-based PDF problem was a hard wall. If a document was captured as page images rather than extractable text, the converter returned nothing. No partial output, no warning — just empty files.&lt;/p>
&lt;hr>
&lt;h2 id="discovering-docling">Discovering Docling&lt;/h2>
&lt;p>IBM Research Zurich&amp;rsquo;s AI for Knowledge team open-sourced &lt;a href="https://github.com/docling-project/docling" target="_blank" rel="noopener noreferrer">Docling&lt;/a>
in July 2024. The project has a specific focus: turning complex documents into structured, AI-ready output. In April 2025, IBM donated it to the Linux Foundation AI &amp;amp; Data, and it now powers data ingestion for Red Hat Enterprise Linux AI. As of this writing it has over 24,000 GitHub stars.&lt;/p>
&lt;p>What makes Docling different is that it treats document conversion as a computer vision problem, not just a text extraction problem.&lt;/p>
&lt;p>&lt;strong>Layout analysis:&lt;/strong> Docling uses an RT-DETR-derived model trained on DocLayNet — IBM&amp;rsquo;s human-annotated dataset of real-world documents — to detect and classify every region on the page: tables, figures, headers, footers, section titles, body text. It knows the structure before it extracts any content.&lt;/p>
&lt;p>&lt;strong>Table reconstruction:&lt;/strong> This is where Docling earns its place for Oracle documentation. It uses a vision transformer called TableFormer that predicts row/column structure and header roles directly from the page image. The result is a proper Markdown table, not a stream of cell values.&lt;/p>
&lt;p>&lt;strong>Image-based PDFs:&lt;/strong> For documents stored as page images, Docling integrates OCR into its pipeline natively. The same converter handles text-based and image-based PDFs without any changes on your end.&lt;/p>
&lt;hr>
&lt;h2 id="the-switch">The Switch&lt;/h2>
&lt;p>The API change was minimal. The old code:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> pymupdf4llm
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>md_text &lt;span style="color:#f92672">=&lt;/span> pymupdf4llm&lt;span style="color:#f92672">.&lt;/span>to_markdown(pdf_path)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The new code:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">from&lt;/span> docling.document_converter &lt;span style="color:#f92672">import&lt;/span> DocumentConverter
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>converter &lt;span style="color:#f92672">=&lt;/span> DocumentConverter()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result &lt;span style="color:#f92672">=&lt;/span> converter&lt;span style="color:#f92672">.&lt;/span>convert(pdf_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>md_text &lt;span style="color:#f92672">=&lt;/span> result&lt;span style="color:#f92672">.&lt;/span>document&lt;span style="color:#f92672">.&lt;/span>export_to_markdown()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Three lines instead of one, but the extra structure pays dividends: &lt;code>DocumentConverter&lt;/code> can be initialized once and reused across an entire batch, which matters when processing a folder of 50 Oracle guides.&lt;/p>
&lt;p>&lt;strong>A note on startup:&lt;/strong> The first time you run Docling, it downloads its ML models from Hugging Face. You will see this:&lt;/p>
&lt;pre tabindex="0">&lt;code>Loading weights: 100%|██████████| 770/770 [00:00&amp;lt;00:00, 1656.35it/s]
&lt;/code>&lt;/pre>&lt;p>This is normal. The models cache locally after the first download and subsequent runs start immediately. If you see a warning about &lt;code>HF_TOKEN&lt;/code>, that is also expected — Docling works without one, but setting a token removes the rate-limit warning:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-zsh" data-lang="zsh">&lt;span style="display:flex;">&lt;span>echo &lt;span style="color:#e6db74">&amp;#39;export HF_TOKEN=&amp;#34;hf_your_token_here&amp;#34;&amp;#39;&lt;/span> &amp;gt;&amp;gt; ~/.zshrc
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="what-changed-in-practice">What Changed in Practice&lt;/h2>
&lt;p>&lt;strong>Oracle documentation:&lt;/strong> Tables that previously collapsed into text now render as proper Markdown tables. A 6-column configuration reference comes back with headers intact and every row correctly aligned.&lt;/p>
&lt;p>&lt;strong>AI books:&lt;/strong> My knowledge base includes dense technical books on LLM engineering and machine learning. These have complex layouts — sidebars, multi-column sections, figures with captions. Docling&amp;rsquo;s layout model handles these significantly better than PyMuPDF4LLM&amp;rsquo;s heuristic approach.&lt;/p>
&lt;p>&lt;strong>Image-based PDFs:&lt;/strong> Documents that previously produced empty output now convert cleanly. The two-step workaround (ocrmypdf → pdf2md) is no longer necessary for most cases.&lt;/p>
&lt;hr>
&lt;h2 id="two-other-improvements">Two Other Improvements&lt;/h2>
&lt;p>While I was updating the engine, I added two things that were overdue:&lt;/p>
&lt;p>&lt;strong>DOCX support.&lt;/strong> The converter now handles Word documents using pandoc as a backend. The same &lt;code>pdf2md&lt;/code> command works for both file types. This matters for Oracle support exports and study notes from my reMarkable.&lt;/p>
&lt;p>&lt;strong>Batch manifest.&lt;/strong> When processing a large folder, the converter now writes a manifest file tracking which files have been converted and their checksums. Re-running on the same folder skips files that haven&amp;rsquo;t changed. A &lt;code>--force&lt;/code> flag overrides this when you need a fresh conversion.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>pdf2md --batch ~/oracle-pdfs/ &lt;span style="color:#75715e"># skips already-converted&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>pdf2md --batch ~/oracle-pdfs/ --force &lt;span style="color:#75715e"># reconverts everything&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="whats-next">What&amp;rsquo;s Next&lt;/h2>
&lt;p>The web UI — which I added in the &lt;a href="https://augmentedresilience.com/posts/adding-a-web-ui-to-my-pdf-to-markdown-converter/" target="_blank" rel="noopener noreferrer">last post&lt;/a>
— has also been updated to use Docling. Drag a PDF onto it, click Convert, and the same deep-learning pipeline runs behind the scenes.&lt;/p>
&lt;p>The next thing I want to add is direct output to the Obsidian inbox. Right now the flow is: convert → download ZIP → move to vault. A toggle that sends output directly to &lt;code>~/projects/obsidian-vault/00-inbox/&lt;/code> would cut that manual step entirely.&lt;/p>
&lt;p>The tool is doing what I originally wanted: converting my Oracle documentation and AI library into clean, searchable Markdown. Docling is what makes that reliable for the documents that actually matter.&lt;/p></content></item><item><title>Adding a Web UI to My PDF to Markdown Converter</title><link>https://augmentedresilience.com/posts/augmented-resilience-posts/adding-a-web-ui-to-my-pdf-to-markdown-converter/</link><pubDate>Sat, 28 Mar 2026 00:00:00 +0000</pubDate><guid>https://augmentedresilience.com/posts/augmented-resilience-posts/adding-a-web-ui-to-my-pdf-to-markdown-converter/</guid><description>&lt;h2 id="the-promise-i-made-to-myself">The Promise I Made to Myself&lt;/h2>
&lt;p>In my last post about &lt;a href="https://augmentedresilience.com/posts/when-your-pdf-workflow-breaks-building-a-markdown-converter-with-claude-code/" target="_blank" rel="noopener noreferrer">building the PDF to Markdown converter&lt;/a>
, I listed some &amp;ldquo;what&amp;rsquo;s next&amp;rdquo; ideas at the end. One of them was:&lt;/p>
&lt;blockquote>
&lt;p>FastAPI wrapper: Create an HTTP API for web apps to use&lt;/p>&lt;/blockquote>
&lt;p>Well, I did it. And I went a step further — I built a full drag-and-drop web UI on top of it.&lt;/p>
&lt;p>The CLI still works exactly as before. This is an addition, not a replacement. But now when I want to convert a batch of PDFs without thinking about terminal commands, I just open a browser tab.&lt;/p></description><content>&lt;h2 id="the-promise-i-made-to-myself">The Promise I Made to Myself&lt;/h2>
&lt;p>In my last post about &lt;a href="https://augmentedresilience.com/posts/when-your-pdf-workflow-breaks-building-a-markdown-converter-with-claude-code/" target="_blank" rel="noopener noreferrer">building the PDF to Markdown converter&lt;/a>
, I listed some &amp;ldquo;what&amp;rsquo;s next&amp;rdquo; ideas at the end. One of them was:&lt;/p>
&lt;blockquote>
&lt;p>FastAPI wrapper: Create an HTTP API for web apps to use&lt;/p>&lt;/blockquote>
&lt;p>Well, I did it. And I went a step further — I built a full drag-and-drop web UI on top of it.&lt;/p>
&lt;p>The CLI still works exactly as before. This is an addition, not a replacement. But now when I want to convert a batch of PDFs without thinking about terminal commands, I just open a browser tab.&lt;/p>
&lt;p>&lt;img src="https://augmentedresilience.com/images/pdf-to-markdown-web-ui.png" alt="Image Description">&lt;/p>
&lt;hr>
&lt;h2 id="what-the-ui-does">What the UI Does&lt;/h2>
&lt;p>The interface is intentionally minimal:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Drag-and-drop zone&lt;/strong> — drop one PDF or fifty onto it&lt;/li>
&lt;li>&lt;strong>Browse button&lt;/strong> — if you prefer clicking&lt;/li>
&lt;li>&lt;strong>Convert button&lt;/strong> — kicks off the conversion&lt;/li>
&lt;li>&lt;strong>Per-file progress bars&lt;/strong> — live updates as each file converts&lt;/li>
&lt;li>&lt;strong>Individual download&lt;/strong> — each completed file gets its own Download button&lt;/li>
&lt;li>&lt;strong>Download all as ZIP&lt;/strong> — one click to grab everything&lt;/li>
&lt;li>&lt;strong>Clear&lt;/strong> — resets the session and cleans up temp files server-side&lt;/li>
&lt;/ul>
&lt;p>Everything runs locally. Files go to a temp directory on your machine, get converted, and are served back to you. Nothing hits an external API.&lt;/p>
&lt;hr>
&lt;h2 id="the-stack">The Stack&lt;/h2>
&lt;p>I kept it as simple as possible:&lt;/p>
&lt;p>&lt;strong>Backend:&lt;/strong> FastAPI + uvicorn&lt;/p>
&lt;p>FastAPI was the obvious choice — it handles file uploads cleanly, has first-class async support, and the &lt;code>python-multipart&lt;/code> library makes multi-file form handling trivial. The conversion logic is unchanged from the CLI: &lt;code>pymupdf4llm.to_markdown()&lt;/code> doing the heavy lifting.&lt;/p>
&lt;p>&lt;strong>Progress updates:&lt;/strong> Server-Sent Events (SSE)&lt;/p>
&lt;p>This is the part I found most interesting. When you hit Convert, the browser opens a persistent connection to &lt;code>/progress/{job_id}&lt;/code> and receives a stream of JSON events — one every 400ms — until the job finishes. No polling loop, no WebSocket complexity. SSE is perfect for this: unidirectional, simple, and built into every modern browser.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">async&lt;/span> &lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">event_stream&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">while&lt;/span> &lt;span style="color:#66d9ef">True&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data &lt;span style="color:#f92672">=&lt;/span> json&lt;span style="color:#f92672">.&lt;/span>dumps({&lt;span style="color:#e6db74">&amp;#34;progress&amp;#34;&lt;/span>: job[&lt;span style="color:#e6db74">&amp;#34;progress&amp;#34;&lt;/span>], &lt;span style="color:#e6db74">&amp;#34;done&amp;#34;&lt;/span>: job[&lt;span style="color:#e6db74">&amp;#34;done&amp;#34;&lt;/span>]})
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">yield&lt;/span> &lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;data: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>data&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#ae81ff">\n\n&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> job[&lt;span style="color:#e6db74">&amp;#34;done&amp;#34;&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">break&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">await&lt;/span> asyncio&lt;span style="color:#f92672">.&lt;/span>sleep(&lt;span style="color:#ae81ff">0.4&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">return&lt;/span> StreamingResponse(event_stream(), media_type&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;text/event-stream&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>On the frontend, consuming it is three lines:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-javascript" data-lang="javascript">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#a6e22e">eventSource&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">new&lt;/span> &lt;span style="color:#a6e22e">EventSource&lt;/span>(&lt;span style="color:#e6db74">`/progress/&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>&lt;span style="color:#a6e22e">jobId&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">`&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a6e22e">eventSource&lt;/span>.&lt;span style="color:#a6e22e">onmessage&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#a6e22e">e&lt;/span> =&amp;gt; {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">const&lt;/span> { &lt;span style="color:#a6e22e">progress&lt;/span>, &lt;span style="color:#a6e22e">done&lt;/span> } &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#a6e22e">JSON&lt;/span>.&lt;span style="color:#a6e22e">parse&lt;/span>(&lt;span style="color:#a6e22e">e&lt;/span>.&lt;span style="color:#a6e22e">data&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e">// update the UI...
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>};
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>Threading:&lt;/strong> The conversion itself is synchronous (PyMuPDF4LLM blocks while it processes pages). To keep the FastAPI event loop from freezing during conversion, each job runs in a &lt;code>ThreadPoolExecutor&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>asyncio&lt;span style="color:#f92672">.&lt;/span>get_event_loop()&lt;span style="color:#f92672">.&lt;/span>run_in_executor(executor, _convert_job, job_id, job_dir)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Four workers by default — enough to handle several simultaneous conversions without overloading the machine.&lt;/p>
&lt;p>&lt;strong>Frontend:&lt;/strong> Vanilla JS, no build step&lt;/p>
&lt;p>I deliberately avoided React, Vue, or any framework. The whole UI is a single &lt;code>static/index.html&lt;/code> file. It loads instantly, has no dependencies to install, and is easy to read and modify. For a local tool that one person uses, this is the right call.&lt;/p>
&lt;hr>
&lt;h2 id="project-structure">Project Structure&lt;/h2>
&lt;p>Here&amp;rsquo;s what changed from the original CLI project:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>pdf-to-markdown/
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pdf2md — original CLI (unchanged)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> app.py — FastAPI server (new)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> static/
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> index.html — drag-drop UI (new)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> serve — start script (new)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> requirements.txt — updated with FastAPI deps
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> venv/ — existing venv, three new packages added
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>serve&lt;/code> script is just:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">#!/usr/bin/env bash
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>cd &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#66d9ef">$(&lt;/span>dirname &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$0&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#66d9ef">)&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>source venv/bin/activate
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>uvicorn app:app --host 0.0.0.0 --port &lt;span style="color:#ae81ff">8765&lt;/span> --reload
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Run it once, open &lt;code>http://localhost:8765&lt;/code>, and you have a working converter in your browser.&lt;/p>
&lt;hr>
&lt;h2 id="one-gotcha-pymupdf4llm-is-synchronous">One Gotcha: PyMuPDF4LLM is Synchronous&lt;/h2>
&lt;p>This tripped me up briefly. &lt;code>pymupdf4llm.to_markdown()&lt;/code> does not return a coroutine — it&amp;rsquo;s a blocking call that can take 10–30 seconds on a large document. If you call it directly in an async FastAPI route handler, you freeze the entire event loop while it runs. No other requests get handled. The SSE stream stops updating.&lt;/p>
&lt;p>The fix is the &lt;code>ThreadPoolExecutor&lt;/code> pattern above — push the blocking work off the event loop entirely. The async route returns immediately, the SSE stream keeps ticking, and the conversion runs in a thread pool where it belongs.&lt;/p>
&lt;hr>
&lt;h2 id="the-download-endpoints">The Download Endpoints&lt;/h2>
&lt;p>Three endpoints handle output:&lt;/p>
&lt;pre tabindex="0">&lt;code>GET /download/{job_id}/{filename} — single .md file
GET /download-all/{job_id} — all .md files as a ZIP
DELETE /job/{job_id} — clean up temp files
&lt;/code>&lt;/pre>&lt;p>The ZIP is built in memory using Python&amp;rsquo;s &lt;code>zipfile&lt;/code> module and streamed directly to the browser — no intermediate file on disk:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>buf &lt;span style="color:#f92672">=&lt;/span> io&lt;span style="color:#f92672">.&lt;/span>BytesIO()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">with&lt;/span> zipfile&lt;span style="color:#f92672">.&lt;/span>ZipFile(buf, &lt;span style="color:#e6db74">&amp;#34;w&amp;#34;&lt;/span>, zipfile&lt;span style="color:#f92672">.&lt;/span>ZIP_DEFLATED) &lt;span style="color:#66d9ef">as&lt;/span> zf:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">for&lt;/span> f &lt;span style="color:#f92672">in&lt;/span> md_files:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> zf&lt;span style="color:#f92672">.&lt;/span>write(f, f&lt;span style="color:#f92672">.&lt;/span>name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>buf&lt;span style="color:#f92672">.&lt;/span>seek(&lt;span style="color:#ae81ff">0&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">return&lt;/span> StreamingResponse(buf, media_type&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;application/zip&amp;#34;&lt;/span>, &lt;span style="color:#f92672">...&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="what-this-unlocks">What This Unlocks&lt;/h2>
&lt;p>The CLI was already useful. The web UI adds a few things the CLI cannot easily do:&lt;/p>
&lt;p>&lt;strong>Non-terminal users.&lt;/strong> Anyone on my network can now use this converter by visiting &lt;code>http://my-machine:8765&lt;/code>. No Python knowledge required.&lt;/p>
&lt;p>&lt;strong>Bulk drop workflows.&lt;/strong> Dragging 20 PDFs from Finder into a browser window and clicking Convert is significantly faster than constructing a &lt;code>--batch&lt;/code> command with the right paths.&lt;/p>
&lt;p>&lt;strong>Visual feedback.&lt;/strong> The progress bars are not just cosmetic. For large PDFs that take 20–30 seconds, knowing the conversion is running (and roughly how far along it is) removes the anxiety of staring at a terminal cursor.&lt;/p>
&lt;hr>
&lt;h2 id="whats-next">What&amp;rsquo;s Next&lt;/h2>
&lt;p>The original roadmap item was &amp;ldquo;FastAPI wrapper.&amp;rdquo; That&amp;rsquo;s done. The next one I&amp;rsquo;m eyeing:&lt;/p>
&lt;p>&lt;strong>Auto-feed to Obsidian inbox.&lt;/strong> Right now the flow is: convert in the web UI, download the ZIP, unzip, move to Obsidian. I&amp;rsquo;d like to add a toggle: &amp;ldquo;Send output directly to &lt;code>~/projects/obsidian-vault/00-inbox/&lt;/code>&amp;rdquo; — one less manual step.&lt;/p>
&lt;p>That&amp;rsquo;s a small addition to the backend. Coming soon.&lt;/p>
&lt;hr>
&lt;h2 id="running-it">Running It&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>cd ~/projects/pdf-to-markdown
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>./serve
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Open http://localhost:8765&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The first run installs nothing new — the three new packages (fastapi, uvicorn, python-multipart) are already in the venv. It just works.&lt;/p></content></item><item><title>Building an AI Conference Directory That Populates Itself</title><link>https://augmentedresilience.com/posts/augmented-resilience-posts/building-an-ai-conference-directory-that-populates-itself/</link><pubDate>Sat, 14 Mar 2026 00:00:00 +0000</pubDate><guid>https://augmentedresilience.com/posts/augmented-resilience-posts/building-an-ai-conference-directory-that-populates-itself/</guid><description>&lt;h2 id="the-problem-ai-conferences-are-everywhere-and-nowhere">The Problem: AI Conferences Are Everywhere and Nowhere&lt;/h2>
&lt;p>If you&amp;rsquo;ve ever tried to find a comprehensive list of upcoming AI conferences, you know the pain. There&amp;rsquo;s no single source. AAAI has their page. NeurIPS has theirs. ICML posts deadlines on OpenReview. Half the emerging summits only exist on LinkedIn event pages or buried in Reddit threads.&lt;/p>
&lt;p>I wanted a simple, searchable directory of AI conferences — one site where I could see what&amp;rsquo;s coming up, filter by topic, and get the key details. But I didn&amp;rsquo;t want to manually curate it. I&amp;rsquo;ve seen too many &amp;ldquo;awesome lists&amp;rdquo; on GitHub that are lovingly maintained for three months and then abandoned.&lt;/p></description><content>&lt;h2 id="the-problem-ai-conferences-are-everywhere-and-nowhere">The Problem: AI Conferences Are Everywhere and Nowhere&lt;/h2>
&lt;p>If you&amp;rsquo;ve ever tried to find a comprehensive list of upcoming AI conferences, you know the pain. There&amp;rsquo;s no single source. AAAI has their page. NeurIPS has theirs. ICML posts deadlines on OpenReview. Half the emerging summits only exist on LinkedIn event pages or buried in Reddit threads.&lt;/p>
&lt;p>I wanted a simple, searchable directory of AI conferences — one site where I could see what&amp;rsquo;s coming up, filter by topic, and get the key details. But I didn&amp;rsquo;t want to manually curate it. I&amp;rsquo;ve seen too many &amp;ldquo;awesome lists&amp;rdquo; on GitHub that are lovingly maintained for three months and then abandoned.&lt;/p>
&lt;p>What I wanted was a system that populates itself.&lt;/p>
&lt;p>So I built one. And with Claude Code running through my PAI system, the whole pipeline — from search to database to website — came together over a few focused sessions.&lt;/p>
&lt;p>Here&amp;rsquo;s the full story.&lt;/p>
&lt;hr>
&lt;h2 id="the-architecture-three-layers-zero-manual-data-entry">The Architecture: Three Layers, Zero Manual Data Entry&lt;/h2>
&lt;p>The final system has three layers, each handling a distinct responsibility:&lt;/p>
&lt;pre tabindex="0">&lt;code>SearXNG (search engine)
→ conference_tracker.py (discovery)
→ Airtable (database)
→ fetch-events.mjs (build-time fetch)
→ React + Vite site on Netlify
&lt;/code>&lt;/pre>&lt;p>Each layer is independently useful, loosely coupled, and replaceable. Let&amp;rsquo;s walk through them.&lt;/p>
&lt;hr>
&lt;h2 id="layer-1-the-tracker--finding-conferences-automatically">Layer 1: The Tracker — Finding Conferences Automatically&lt;/h2>
&lt;p>The foundation is a Python script called &lt;code>conference_tracker.py&lt;/code>. Its job is simple: search the web for AI conferences and store what it finds.&lt;/p>
&lt;h3 id="search-searxng-instead-of-google">Search: SearXNG Instead of Google&lt;/h3>
&lt;p>Rather than hitting the Google API (with its quotas and billing), I use &lt;a href="https://github.com/searxng/searxng" target="_blank" rel="noopener noreferrer">SearXNG&lt;/a>
— an open-source, self-hosted meta-search engine. It aggregates results from Google, Bing, DuckDuckGo, and others without API keys or rate limits.&lt;/p>
&lt;p>The tracker runs a curated list of search queries defined in &lt;code>config.yaml&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">search_queries&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;AI conference 2026&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;artificial intelligence conference 2026&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;machine learning conference 2026&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;NeurIPS 2026&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;ICML 2026&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;AAAI 2026&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;AI summit 2026&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;deep learning conference 2026&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;computer vision conference 2026 CVPR&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#e6db74">&amp;#34;natural language processing conference 2026&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Each query returns up to 10 results. The tracker extracts the title, URL, and snippet from each result, deduplicates against what&amp;rsquo;s already in the database, and stores new finds.&lt;/p>
&lt;h3 id="storage-airtable-as-the-source-of-truth">Storage: Airtable as the Source of Truth&lt;/h3>
&lt;p>Why Airtable? Because it&amp;rsquo;s a real database with an API, but it also has a spreadsheet-like UI for manual review. When you&amp;rsquo;re building a pipeline that discovers data automatically, you want a way to eyeball the results and clean up noise — and Airtable is perfect for that.&lt;/p>
&lt;p>The tracker writes five fields per record: &lt;code>title&lt;/code>, &lt;code>websiteUrl&lt;/code>, &lt;code>description&lt;/code>, &lt;code>Source Query&lt;/code>, and &lt;code>Date Found&lt;/code>. That&amp;rsquo;s it. Just the raw discovery data. The structured details come later.&lt;/p>
&lt;p>The deduplication is URL-based — normalized and lowercased. If we&amp;rsquo;ve already stored &lt;code>neurips.cc/2026&lt;/code>, we don&amp;rsquo;t store it again even if it appears in a different search query.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">extract_conference_info&lt;/span>(result, source_query):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;title&amp;#34;&lt;/span>: result[&lt;span style="color:#e6db74">&amp;#34;title&amp;#34;&lt;/span>][:&lt;span style="color:#ae81ff">200&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;websiteUrl&amp;#34;&lt;/span>: result[&lt;span style="color:#e6db74">&amp;#34;url&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;description&amp;#34;&lt;/span>: result[&lt;span style="color:#e6db74">&amp;#34;snippet&amp;#34;&lt;/span>][:&lt;span style="color:#ae81ff">1000&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;Source Query&amp;#34;&lt;/span>: source_query,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;Date Found&amp;#34;&lt;/span>: datetime&lt;span style="color:#f92672">.&lt;/span>now(timezone&lt;span style="color:#f92672">.&lt;/span>utc)&lt;span style="color:#f92672">.&lt;/span>strftime(&lt;span style="color:#e6db74">&amp;#34;%Y-%m-&lt;/span>&lt;span style="color:#e6db74">%d&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>After one run, we had 87 unique conference records. The real stuff — NeurIPS, ICML, CVPR, AAAI — alongside smaller but interesting events like the Quantum AI and NLP Conference, Deep Learning Indaba, and the Wharton Human-AI Research summit.&lt;/p>
&lt;hr>
&lt;h2 id="layer-2-the-website--react--vite-on-netlify">Layer 2: The Website — React + Vite on Netlify&lt;/h2>
&lt;p>The directory itself is a React app built with Vite and deployed on Netlify. It&amp;rsquo;s a single-page app with search, tag filtering, and individual event pages.&lt;/p>
&lt;p>The key architectural decision: &lt;strong>data is fetched at build time, not runtime.&lt;/strong> A prebuild script (&lt;code>fetch-events.mjs&lt;/code>) pulls conference data from the database and writes it to a &lt;code>data.ts&lt;/code> file that Vite bundles into the site. This means:&lt;/p>
&lt;ul>
&lt;li>No API keys exposed in the browser&lt;/li>
&lt;li>No CORS issues&lt;/li>
&lt;li>Instant page loads (data is already in the bundle)&lt;/li>
&lt;li>The site works even if Airtable is temporarily down&lt;/li>
&lt;/ul>
&lt;p>The prebuild hook in &lt;code>package.json&lt;/code> makes this automatic:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;scripts&amp;#34;&lt;/span>: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;fetch-events&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;bun scripts/fetch-events.mjs&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;prebuild&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;bun scripts/fetch-events.mjs&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;#34;build&amp;#34;&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;vite build&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Every time Netlify builds the site, it automatically fetches the latest data from Airtable. Fresh data on every deploy.&lt;/p>
&lt;hr>
&lt;h2 id="the-middleman-problem-cutting-google-sheets">The Middleman Problem: Cutting Google Sheets&lt;/h2>
&lt;p>Here&amp;rsquo;s where the story gets interesting.&lt;/p>
&lt;p>The original pipeline had an extra step: Airtable → Google Sheets → website. The &lt;code>fetch-events.mjs&lt;/code> script was pulling from a published Google Sheet CSV. Why? Because when I first prototyped the site, I started with a spreadsheet. It was quick and easy.&lt;/p>
&lt;p>But once the conference tracker was writing directly to Airtable, Google Sheets became a middleman with no purpose. Data had to be synced from Airtable to Sheets (manually or via Zapier), and that sync was another thing that could break.&lt;/p>
&lt;p>The fix was straightforward: teach &lt;code>fetch-events.mjs&lt;/code> to talk directly to the Airtable API.&lt;/p>
&lt;h3 id="airtables-rest-api">Airtable&amp;rsquo;s REST API&lt;/h3>
&lt;p>The Airtable API is clean. A single GET request returns records as JSON:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-javascript" data-lang="javascript">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#a6e22e">url&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">new&lt;/span> &lt;span style="color:#a6e22e">URL&lt;/span>(&lt;span style="color:#e6db74">`https://api.airtable.com/v0/&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>&lt;span style="color:#a6e22e">baseId&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">/&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>&lt;span style="color:#a6e22e">tableId&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">`&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#a6e22e">resp&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">await&lt;/span> &lt;span style="color:#a6e22e">fetch&lt;/span>(&lt;span style="color:#a6e22e">url&lt;/span>.&lt;span style="color:#a6e22e">toString&lt;/span>(), {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">headers&lt;/span>&lt;span style="color:#f92672">:&lt;/span> { &lt;span style="color:#a6e22e">Authorization&lt;/span>&lt;span style="color:#f92672">:&lt;/span> &lt;span style="color:#e6db74">`Bearer &lt;/span>&lt;span style="color:#e6db74">${&lt;/span>&lt;span style="color:#a6e22e">pat&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">`&lt;/span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>});
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#a6e22e">data&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">await&lt;/span> &lt;span style="color:#a6e22e">resp&lt;/span>.&lt;span style="color:#a6e22e">json&lt;/span>();
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">// data.records = [{ id, fields: { title, date, ... } }]
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The one gotcha: Airtable paginates at 100 records. You need to follow the &lt;code>offset&lt;/code> token:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-javascript" data-lang="javascript">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">async&lt;/span> &lt;span style="color:#66d9ef">function&lt;/span> &lt;span style="color:#a6e22e">fetchFromAirtable&lt;/span>(&lt;span style="color:#a6e22e">pat&lt;/span>, &lt;span style="color:#a6e22e">baseId&lt;/span>, &lt;span style="color:#a6e22e">tableId&lt;/span>) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#a6e22e">allRecords&lt;/span> &lt;span style="color:#f92672">=&lt;/span> [];
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">let&lt;/span> &lt;span style="color:#a6e22e">offset&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">null&lt;/span>;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">do&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#a6e22e">url&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">new&lt;/span> &lt;span style="color:#a6e22e">URL&lt;/span>(&lt;span style="color:#e6db74">`https://api.airtable.com/v0/&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>&lt;span style="color:#a6e22e">baseId&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">/&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>&lt;span style="color:#a6e22e">tableId&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">`&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> (&lt;span style="color:#a6e22e">offset&lt;/span>) &lt;span style="color:#a6e22e">url&lt;/span>.&lt;span style="color:#a6e22e">searchParams&lt;/span>.&lt;span style="color:#a6e22e">set&lt;/span>(&lt;span style="color:#e6db74">&amp;#39;offset&amp;#39;&lt;/span>, &lt;span style="color:#a6e22e">offset&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#a6e22e">resp&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">await&lt;/span> &lt;span style="color:#a6e22e">fetch&lt;/span>(&lt;span style="color:#a6e22e">url&lt;/span>.&lt;span style="color:#a6e22e">toString&lt;/span>(), {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">headers&lt;/span>&lt;span style="color:#f92672">:&lt;/span> { &lt;span style="color:#a6e22e">Authorization&lt;/span>&lt;span style="color:#f92672">:&lt;/span> &lt;span style="color:#e6db74">`Bearer &lt;/span>&lt;span style="color:#e6db74">${&lt;/span>&lt;span style="color:#a6e22e">pat&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">`&lt;/span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> });
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">const&lt;/span> &lt;span style="color:#a6e22e">data&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">await&lt;/span> &lt;span style="color:#a6e22e">resp&lt;/span>.&lt;span style="color:#a6e22e">json&lt;/span>();
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">allRecords&lt;/span>.&lt;span style="color:#a6e22e">push&lt;/span>(...&lt;span style="color:#a6e22e">data&lt;/span>.&lt;span style="color:#a6e22e">records&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">offset&lt;/span> &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#a6e22e">data&lt;/span>.&lt;span style="color:#a6e22e">offset&lt;/span> &lt;span style="color:#f92672">||&lt;/span> &lt;span style="color:#66d9ef">null&lt;/span>;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> } &lt;span style="color:#66d9ef">while&lt;/span> (&lt;span style="color:#a6e22e">offset&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> &lt;span style="color:#a6e22e">allRecords&lt;/span>;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="graceful-fallback">Graceful Fallback&lt;/h3>
&lt;p>I kept the Google Sheets path as a fallback. The &lt;code>main()&lt;/code> function uses a priority chain:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Airtable&lt;/strong> — if &lt;code>AIRTABLE_PAT&lt;/code>, &lt;code>AIRTABLE_BASE_ID&lt;/code>, &lt;code>AIRTABLE_TABLE_ID&lt;/code> are set&lt;/li>
&lt;li>&lt;strong>Google Sheets&lt;/strong> — if &lt;code>GOOGLE_SHEET_CSV_URL&lt;/code> is set&lt;/li>
&lt;li>&lt;strong>Fallback events&lt;/strong> — hardcoded sample data so the build never fails&lt;/li>
&lt;/ol>
&lt;p>This means you can&amp;rsquo;t break the site by misconfiguring a data source. The build always succeeds.&lt;/p>
&lt;hr>
&lt;h2 id="layer-3-the-enrichment--ai-powered-data-extraction">Layer 3: The Enrichment — AI-Powered Data Extraction&lt;/h2>
&lt;p>This is where things got really interesting.&lt;/p>
&lt;p>After cutting Google Sheets, I had 87 conference records in Airtable. But they only had three useful fields: title, description, and URL. No dates. No locations. No tags. The site worked, but every event card was sparse — no way to filter by date or location, no tags to browse by topic.&lt;/p>
&lt;p>Filling in 87 records by hand? No thanks.&lt;/p>
&lt;h3 id="the-idea-visit-each-url-and-ask-ai-to-extract-the-data">The Idea: Visit Each URL and Ask AI to Extract the Data&lt;/h3>
&lt;p>The approach: for each conference record, fetch its web page, extract the text content, and use AI inference to pull out structured fields like date, location, organizer, and tags.&lt;/p>
&lt;p>I built an enrichment script — &lt;code>enrich_conferences.py&lt;/code> — that sits alongside the tracker in the same project.&lt;/p>
&lt;h3 id="step-1-fetch-and-clean-the-page">Step 1: Fetch and Clean the Page&lt;/h3>
&lt;p>Each conference URL gets fetched with &lt;code>requests&lt;/code>, then cleaned with BeautifulSoup. Navigation, footers, scripts, and styling get stripped, leaving just the text content:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">fetch_page_text&lt;/span>(url, timeout&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">15&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> resp &lt;span style="color:#f92672">=&lt;/span> requests&lt;span style="color:#f92672">.&lt;/span>get(url, headers&lt;span style="color:#f92672">=&lt;/span>headers, timeout&lt;span style="color:#f92672">=&lt;/span>timeout)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> soup &lt;span style="color:#f92672">=&lt;/span> BeautifulSoup(resp&lt;span style="color:#f92672">.&lt;/span>text, &lt;span style="color:#e6db74">&amp;#34;html.parser&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">for&lt;/span> tag &lt;span style="color:#f92672">in&lt;/span> soup([&lt;span style="color:#e6db74">&amp;#34;script&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;style&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;nav&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;footer&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;header&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;aside&amp;#34;&lt;/span>]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tag&lt;span style="color:#f92672">.&lt;/span>decompose()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> text &lt;span style="color:#f92672">=&lt;/span> soup&lt;span style="color:#f92672">.&lt;/span>get_text(separator&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>, strip&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lines &lt;span style="color:#f92672">=&lt;/span> [line&lt;span style="color:#f92672">.&lt;/span>strip() &lt;span style="color:#66d9ef">for&lt;/span> line &lt;span style="color:#f92672">in&lt;/span> text&lt;span style="color:#f92672">.&lt;/span>splitlines() &lt;span style="color:#66d9ef">if&lt;/span> line&lt;span style="color:#f92672">.&lt;/span>strip()]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#f92672">.&lt;/span>join(lines)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="step-2-ai-extraction-via-pai-inference">Step 2: AI Extraction via PAI Inference&lt;/h3>
&lt;p>The cleaned text gets sent to Claude (via PAI&amp;rsquo;s Inference tool) with a structured extraction prompt. The prompt is specific about what to extract and what format to use:&lt;/p>
&lt;pre tabindex="0">&lt;code>Given text from a conference web page, extract these fields as JSON:
{
&amp;#34;date&amp;#34;: &amp;#34;human-readable date like &amp;#39;May 5-6, 2026&amp;#39;&amp;#34;,
&amp;#34;endDate&amp;#34;: &amp;#34;ISO end date like &amp;#39;2026-05-06&amp;#39;&amp;#34;,
&amp;#34;location&amp;#34;: &amp;#34;City, State/Country&amp;#34;,
&amp;#34;venue&amp;#34;: &amp;#34;venue name&amp;#34;,
&amp;#34;price&amp;#34;: &amp;#34;ticket price or &amp;#39;Free&amp;#39;&amp;#34;,
&amp;#34;organizer&amp;#34;: &amp;#34;organizing body&amp;#34;,
&amp;#34;tags&amp;#34;: &amp;#34;comma-separated topic tags (max 4)&amp;#34;
}
&lt;/code>&lt;/pre>&lt;p>One critical addition: if the page is a &lt;strong>list of conferences&lt;/strong> (like &amp;ldquo;Top 10 AI Conferences of 2026&amp;rdquo;), the AI returns &lt;code>{&amp;quot;is_list_page&amp;quot;: true}&lt;/code> and the script skips it. This was essential — about 15% of our URLs were aggregator pages, not individual conference pages.&lt;/p>
&lt;h3 id="step-3-write-back-to-airtable">Step 3: Write Back to Airtable&lt;/h3>
&lt;p>Non-empty extracted fields get PATCHed back to Airtable. The script only writes fields that actually exist in the table schema — a lesson learned the hard way when &lt;code>venue&lt;/code> and &lt;code>imageUrl&lt;/code> threw 422 errors because those columns hadn&amp;rsquo;t been created yet.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">build_patch_fields&lt;/span>(extracted, allowed_fields):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> extracted&lt;span style="color:#f92672">.&lt;/span>get(&lt;span style="color:#e6db74">&amp;#34;is_list_page&amp;#34;&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> patch &lt;span style="color:#f92672">=&lt;/span> {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">for&lt;/span> key &lt;span style="color:#f92672">in&lt;/span> [&lt;span style="color:#e6db74">&amp;#34;date&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;endDate&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;location&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;venue&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;price&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;organizer&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;tags&amp;#34;&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> key &lt;span style="color:#f92672">not&lt;/span> &lt;span style="color:#f92672">in&lt;/span> allowed_fields:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">continue&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> val &lt;span style="color:#f92672">=&lt;/span> extracted&lt;span style="color:#f92672">.&lt;/span>get(key, &lt;span style="color:#e6db74">&amp;#34;&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> isinstance(val, str) &lt;span style="color:#f92672">and&lt;/span> val&lt;span style="color:#f92672">.&lt;/span>strip():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> patch[key] &lt;span style="color:#f92672">=&lt;/span> val&lt;span style="color:#f92672">.&lt;/span>strip()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> patch &lt;span style="color:#66d9ef">if&lt;/span> patch &lt;span style="color:#66d9ef">else&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="the-results">The Results&lt;/h3>
&lt;p>Running the enrichment script across all 87 records:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Outcome&lt;/th>
&lt;th>Count&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Records enriched&lt;/td>
&lt;td>48&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>List/aggregator pages (correctly skipped)&lt;/td>
&lt;td>12&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>No extractable fields (social media, OpenReview, etc.)&lt;/td>
&lt;td>11&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Errors (timeouts, HTTP 403s)&lt;/td>
&lt;td>16&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>After enrichment:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Field&lt;/th>
&lt;th>Records populated&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Date&lt;/td>
&lt;td>42&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Location&lt;/td>
&lt;td>41&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Tags&lt;/td>
&lt;td>47&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Organizer&lt;/td>
&lt;td>27&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Price&lt;/td>
&lt;td>4&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>From zero structured data to a directory where most events have dates, locations, and topic tags — without opening a single conference website manually.&lt;/p>
&lt;p>Some highlights from the extraction:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>NeurIPS 2026:&lt;/strong> December 6-12, Sydney, Australia — Deep Learning, Research, Algorithms, LLMs&lt;/li>
&lt;li>&lt;strong>CVPR 2026:&lt;/strong> June 3-7, Denver, CO — Computer Vision, Deep Learning, Research&lt;/li>
&lt;li>&lt;strong>ICML 2026:&lt;/strong> July 6-11, Seoul, South Korea — LLMs, Computer Vision, NLP, Robotics&lt;/li>
&lt;li>&lt;strong>AI Council 2026:&lt;/strong> May 12-14, San Francisco, CA — Generative AI, ML Ops, AI Safety&lt;/li>
&lt;li>&lt;strong>MIDL 2026:&lt;/strong> July 8-10, Taipei — Deep Learning, Healthcare AI, Computer Vision&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="the-pipeline-today">The Pipeline Today&lt;/h2>
&lt;p>Here&amp;rsquo;s what the full system looks like now:&lt;/p>
&lt;pre tabindex="0">&lt;code>SearXNG (self-hosted search)
→ conference_tracker.py (Python — discovers conferences)
→ Airtable (source of truth — 87 records)
→ enrich_conferences.py (Python — AI-powered field extraction)
→ Airtable (now with dates, locations, tags)
→ fetch-events.mjs (Node — build-time data fetch)
→ data.ts (bundled into the site)
→ React + Vite app on Netlify
&lt;/code>&lt;/pre>&lt;p>The tracker discovers. The enricher structures. The fetcher delivers. The site displays. Each piece runs independently and can be re-run at any time.&lt;/p>
&lt;p>The enrichment script is idempotent — it only processes records where the &lt;code>date&lt;/code> field is empty, so running it again only touches new or previously-failed records.&lt;/p>
&lt;hr>
&lt;h2 id="what-id-do-differently-and-whats-next">What I&amp;rsquo;d Do Differently (And What&amp;rsquo;s Next)&lt;/h2>
&lt;h3 id="the-timeout-problem">The Timeout Problem&lt;/h3>
&lt;p>About 16 records hit the 25-second inference timeout. The fast tier (Haiku) is quick but occasionally chokes on pages with dense, complex content. A retry mechanism using the standard tier (Sonnet) for failed records would catch most of these.&lt;/p>
&lt;h3 id="missing-table-columns">Missing Table Columns&lt;/h3>
&lt;p>The &lt;code>venue&lt;/code> and &lt;code>imageUrl&lt;/code> fields don&amp;rsquo;t exist in the Airtable table yet. The enrichment script extracts venue names beautifully (The Venetian for Ai4, COEX Convention Center for ICML, Dongguk University for AAAI Summer), but the data gets dropped because the columns aren&amp;rsquo;t there. A quick table schema update in the Airtable UI fixes this.&lt;/p>
&lt;h3 id="scheduled-runs">Scheduled Runs&lt;/h3>
&lt;p>Right now, both the tracker and enricher are manual. The natural next step is scheduling — run the tracker daily to discover new conferences, the enricher on new records, and trigger a Netlify deploy afterward. The Netlify build hook is already configured; it just needs a cron job or GitHub Action to call it.&lt;/p>
&lt;h3 id="data-quality">Data Quality&lt;/h3>
&lt;p>Some records are noise — Reddit discussion threads, Amazon Science blog posts, Twitter/X profiles. A quality filter (either rule-based on URL patterns or AI-powered) would clean the dataset before enrichment runs.&lt;/p>
&lt;hr>
&lt;h2 id="lessons-learned">Lessons Learned&lt;/h2>
&lt;h3 id="1-eliminate-middlemen-early">1. Eliminate Middlemen Early&lt;/h3>
&lt;p>Google Sheets added zero value once Airtable was in the picture. But it lingered because it was the &amp;ldquo;original&amp;rdquo; approach. Every extra hop in a pipeline is a thing that can break, a thing that needs syncing, and a thing that slows you down. Cut it.&lt;/p>
&lt;h3 id="2-build-time-data-fetching-is-underrated">2. Build-Time Data Fetching Is Underrated&lt;/h3>
&lt;p>Pulling data at build time instead of runtime means no API keys in the browser, no loading spinners, and no CORS headaches. For data that changes daily (not per-second), this is the right architecture.&lt;/p>
&lt;h3 id="3-ai-extraction-beats-manual-curation">3. AI Extraction Beats Manual Curation&lt;/h3>
&lt;p>Using AI to extract structured data from unstructured web pages isn&amp;rsquo;t perfect — we got 48 out of 87 records enriched, not 87 out of 87. But it took 20 minutes of runtime versus what would have been hours of manual work. And the script is re-runnable. Improvement is incremental.&lt;/p>
&lt;h3 id="4-detect-your-datas-shape-before-writing">4. Detect Your Data&amp;rsquo;s Shape Before Writing&lt;/h3>
&lt;p>The Airtable 422 errors on &lt;code>venue&lt;/code> were entirely preventable. The enrichment script now probes the table schema at startup and only writes to fields that exist. Defensive coding at system boundaries saves debugging time.&lt;/p>
&lt;h3 id="5-list-page-detection-is-essential-for-web-scraping-pipelines">5. List Page Detection Is Essential for Web Scraping Pipelines&lt;/h3>
&lt;p>When you&amp;rsquo;re scraping URLs from search results, a significant percentage will be aggregator pages (&amp;ldquo;Top 10 Best AI Conferences&amp;rdquo;) rather than individual event pages. If you don&amp;rsquo;t detect and skip these, you&amp;rsquo;ll corrupt your dataset with merged data from multiple events. The &lt;code>is_list_page&lt;/code> flag in the AI extraction prompt was one of the highest-value additions to the whole pipeline.&lt;/p>
&lt;hr>
&lt;h2 id="the-bigger-picture">The Bigger Picture&lt;/h2>
&lt;p>This project is a miniature version of a pattern I keep coming back to: &lt;strong>systems that compound.&lt;/strong>&lt;/p>
&lt;p>The tracker runs once and discovers 87 conferences. The enricher runs once and structures 48 of them. The next time the tracker runs, it discovers only &lt;em>new&lt;/em> conferences (deduplication handles the rest). The next time the enricher runs, it only processes records it hasn&amp;rsquo;t touched yet.&lt;/p>
&lt;p>Every run makes the dataset better without redoing previous work. That&amp;rsquo;s the whole point of building infrastructure instead of doing things manually — you invest upfront so the system improves over time with minimal additional effort.&lt;/p>
&lt;p>Working with Claude through PAI made each layer come together faster than I expected. The tracker, the Airtable integration, the Google Sheets elimination, the enrichment script — each was a focused session where the AI handled the implementation details while I focused on architecture decisions.&lt;/p>
&lt;p>That&amp;rsquo;s the augmented part of Augmented Resilience. Not replacing the thinking — amplifying it.&lt;/p></content></item><item><title>When Your PDF Workflow Breaks - Building a Markdown Converter with Claude Code</title><link>https://augmentedresilience.com/posts/augmented-resilience-posts/building-a-pdf-to-markdown-converter-with-claude-code/</link><pubDate>Wed, 18 Feb 2026 00:00:00 +0000</pubDate><guid>https://augmentedresilience.com/posts/augmented-resilience-posts/building-a-pdf-to-markdown-converter-with-claude-code/</guid><description>&lt;h2 id="the-problem-pdfs-are-knowledge-prisons">The Problem: PDFs Are Knowledge Prisons&lt;/h2>
&lt;p>You know that feeling when you download a brilliant research paper, only to realize you can&amp;rsquo;t easily feed it into your AI workflow? Or when you want to add documentation to your knowledge base, but it&amp;rsquo;s locked in a format that doesn&amp;rsquo;t play well with version control or LLM tools?&lt;/p>
&lt;p>Yeah, I was there last week.&lt;/p>
&lt;p>I had just downloaded a fascinating 1.3MB research paper on Generative Engine Optimization and wanted to process it with my AI tools. But PDFs are terrible for this. They&amp;rsquo;re designed for &lt;em>printing&lt;/em>, not for &lt;em>processing&lt;/em>. What I needed was Markdown—clean, portable, AI-friendly Markdown.&lt;/p></description><content>&lt;h2 id="the-problem-pdfs-are-knowledge-prisons">The Problem: PDFs Are Knowledge Prisons&lt;/h2>
&lt;p>You know that feeling when you download a brilliant research paper, only to realize you can&amp;rsquo;t easily feed it into your AI workflow? Or when you want to add documentation to your knowledge base, but it&amp;rsquo;s locked in a format that doesn&amp;rsquo;t play well with version control or LLM tools?&lt;/p>
&lt;p>Yeah, I was there last week.&lt;/p>
&lt;p>I had just downloaded a fascinating 1.3MB research paper on Generative Engine Optimization and wanted to process it with my AI tools. But PDFs are terrible for this. They&amp;rsquo;re designed for &lt;em>printing&lt;/em>, not for &lt;em>processing&lt;/em>. What I needed was Markdown—clean, portable, AI-friendly Markdown.&lt;/p>
&lt;p>So I built a converter. And with Claude Code as my copilot through the PAI (Personal AI Infrastructure) system, the whole thing took less than 30 minutes.&lt;/p>
&lt;p>Here&amp;rsquo;s how it went down.&lt;/p>
&lt;hr>
&lt;h2 id="why-markdown-is-better-than-pdf-for-llms">Why Markdown is Better Than PDF for LLMs&lt;/h2>
&lt;p>Before diving into the build, let&amp;rsquo;s answer the obvious question: &lt;em>why bother converting?&lt;/em> Can&amp;rsquo;t LLMs just read PDFs directly?&lt;/p>
&lt;p>Technically, yes. But the results are significantly worse, and the reasons are fundamental to how PDFs work.&lt;/p>
&lt;h3 id="pdfs-are-layout-first-not-structure-first">PDFs Are Layout-First, Not Structure-First&lt;/h3>
&lt;p>PDFs were designed to describe &lt;em>where things appear on a page&lt;/em>, not &lt;em>what they mean&lt;/em>. As Steven Howard explains in &lt;a href="https://untetheredai.substack.com/p/why-pdfs-fail-under-llm-parsing" target="_blank" rel="noopener noreferrer">Why PDFs Fail Under LLM Parsing&lt;/a>
:&lt;/p>
&lt;blockquote>
&lt;p>&amp;ldquo;Table cells with wrapped text insert hard line breaks that fragment token continuity and break logical row recognition. Headers and footers simply add noise to the context when used with LLMs. Sentences are split with arbitrary CR/LFs making it very difficult to find paragraph boundaries.&amp;rdquo;&lt;/p>&lt;/blockquote>
&lt;p>This architectural mismatch — a format designed for printing being fed into a system designed for understanding — causes cascading problems downstream.&lt;/p>
&lt;h3 id="the-token-efficiency-problem">The Token Efficiency Problem&lt;/h3>
&lt;p>Every token your LLM processes costs money and consumes context window space. PDF extraction wastes both.&lt;/p>
&lt;p>According to analysis from &lt;a href="https://markdownconverters.com/blog/pdf-vs-markdown-ai-tokens" target="_blank" rel="noopener noreferrer">MarkdownConverters&lt;/a>
, &lt;strong>Markdown saves up to 70% more tokens compared to extracted PDF text&lt;/strong> for the same content. The culprit: PDF extraction introduces formatting artifacts, metadata noise, headers/footers, and encoding remnants that all consume tokens without adding semantic value.&lt;/p>
&lt;p>To put that in practical terms: a PDF that would use 10,000 tokens might only need 3,000 tokens when properly converted to Markdown. At scale, this compounds dramatically.&lt;/p>
&lt;h3 id="the-rag-performance-problem">The RAG Performance Problem&lt;/h3>
&lt;p>If you&amp;rsquo;re building Retrieval Augmented Generation (RAG) systems — using documents as a knowledge base for AI — document format directly impacts answer quality.&lt;/p>
&lt;p>The research here is compelling:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Academic validation&lt;/strong>: A 2024 paper on arXiv (&lt;a href="https://arxiv.org/abs/2401.12599" target="_blank" rel="noopener noreferrer">Revolutionizing RAG with Enhanced PDF Structure Recognition&lt;/a>
) found that &amp;ldquo;the low accuracy of PDF parsing significantly impacts the effectiveness of professional knowledge-based QA.&amp;rdquo;&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Industry validation&lt;/strong>: NVIDIA&amp;rsquo;s technical blog documents how their NeMo Retriever pipeline converts extracted content to Markdown specifically because it &amp;ldquo;preserves row/column relationships in an LLM-native format, significantly reducing numeric hallucination&amp;rdquo; — and &lt;strong>reduces incorrect answers by 50%&lt;/strong>. (&lt;a href="https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/" target="_blank" rel="noopener noreferrer">NVIDIA: Approaches to PDF Data Extraction for Information Retrieval&lt;/a>
)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Chunking quality&lt;/strong>: Analysis from &lt;a href="https://medium.com/data-science/improved-rag-document-processing-with-markdown-426a2e0dd82b" target="_blank" rel="noopener noreferrer">Towards Data Science&lt;/a>
shows that Markdown&amp;rsquo;s heading structure (&lt;code>#&lt;/code>, &lt;code>##&lt;/code>, &lt;code>###&lt;/code>) produces semantically meaningful chunks, while PDF-based chunking relies on arbitrary page breaks and heuristics.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Retrieval failure rates&lt;/strong>: Unstructured.io&amp;rsquo;s &lt;a href="https://unstructured.io/blog/contextual-chunking-in-unstructured-platform-boost-your-rag-retrieval-accuracy" target="_blank" rel="noopener noreferrer">research on contextual chunking&lt;/a>
— tested across 5,563 question-answer pairs — showed an &lt;strong>84% reduction in retrieval failure rates&lt;/strong> when using structure-aware chunking (the kind Markdown enables natively).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Real-world outcomes&lt;/strong>: The 2025 Semrush AI Index, cited by &lt;a href="https://developer.webex.com/blog/boosting-ai-performance-the-power-of-llm-friendly-content-in-markdown" target="_blank" rel="noopener noreferrer">Webex Developers Blog&lt;/a>
, found that 72% of top AI-indexed articles used Markdown or Markdown-like structures, achieving &lt;strong>34% higher retrieval accuracy&lt;/strong> across ChatGPT, Perplexity, and Gemini.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="the-bottom-line">The Bottom Line&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Metric&lt;/th>
&lt;th>Impact&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Token reduction&lt;/td>
&lt;td>Up to 70% fewer tokens vs PDF extraction&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Incorrect answers in RAG&lt;/td>
&lt;td>50% reduction (NVIDIA NeMo)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Retrieval failure rates&lt;/td>
&lt;td>84% reduction (Unstructured.io)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Retrieval accuracy&lt;/td>
&lt;td>34% higher (Semrush AI Index 2025)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Markdown isn&amp;rsquo;t just more convenient — it&amp;rsquo;s meaningfully better for AI. Converting your document libraries is one of the highest-ROI steps you can take before building any LLM-powered workflow.&lt;/p>
&lt;hr>
&lt;h2 id="the-first-failure-when-bleeding-edge-python-bites-back">The First Failure: When Bleeding-Edge Python Bites Back&lt;/h2>
&lt;p>I&amp;rsquo;m running Python 3.14.2—the latest release, barely a few weeks old. Modern, shiny, cutting-edge. Perfect, right?&lt;/p>
&lt;p>Not quite.&lt;/p>
&lt;p>My first instinct was to use &lt;code>marker-pdf&lt;/code>, a high-performance converter optimized for scientific papers and books. It looked perfect on paper (pun intended). But when I tried to install it:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>Building wheel for Pillow (pyproject.toml): finished with status &amp;#39;error&amp;#39;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Ugh.&lt;/p>
&lt;p>Turns out, &lt;code>marker-pdf&lt;/code> depends on Pillow (the Python imaging library), and Pillow hasn&amp;rsquo;t built binary wheels for Python 3.14 yet. I could have downgraded Python. I could have fought with source compilation. But why?&lt;/p>
&lt;p>&lt;strong>This is where working with Claude Code really shines.&lt;/strong> Instead of going down a rabbit hole trying to force marker-pdf to work, Claude suggested pivoting to &lt;strong>PyMuPDF4LLM&lt;/strong>—a mature, actively maintained library specifically designed for AI/LLM workflows.&lt;/p>
&lt;p>And it just worked.&lt;/p>
&lt;hr>
&lt;h2 id="the-solution-pymupdf4llm">The Solution: PyMuPDF4LLM&lt;/h2>
&lt;p>PyMuPDF4LLM turned out to be exactly what I needed:&lt;/p>
&lt;ul>
&lt;li>Works flawlessly with Python 3.14 (no compilation errors)&lt;/li>
&lt;li>Fast and accurate conversion&lt;/li>
&lt;li>Built specifically for feeding documents into LLMs&lt;/li>
&lt;li>Clean, simple API&lt;/li>
&lt;li>Actively maintained by the PyMuPDF team&lt;/li>
&lt;/ul>
&lt;p>The installation was literally:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>pip install pymupdf4llm
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Five seconds later, I was ready to go.&lt;/p>
&lt;hr>
&lt;h2 id="building-the-tool-first-principles-thinking">Building the Tool: First Principles Thinking&lt;/h2>
&lt;p>As someone new to the CLI world, I&amp;rsquo;ve been learning to think through project structure from first principles. Where should this live? How should it be organized?&lt;/p>
&lt;p>With Claude&amp;rsquo;s guidance, I chose &lt;code>/Users/dsa/projects/pdf-to-markdown/&lt;/code> for a few key reasons:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Separation of Concerns:&lt;/strong> Tool projects should be separate from my main workspace&lt;/li>
&lt;li>&lt;strong>Discoverability:&lt;/strong> Clear, descriptive naming means I&amp;rsquo;ll find it again in 6 months&lt;/li>
&lt;li>&lt;strong>Reusability:&lt;/strong> This structure works both as a CLI tool AND as a library I could import later&lt;/li>
&lt;/ol>
&lt;p>The project structure ended up simple but complete:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>pdf-to-markdown/
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── README.md # Documentation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── venv/ # Isolated Python environment
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── input/ # Test PDFs
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── output/ # Generated markdown
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── pdf2md # CLI wrapper script
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>└── requirements.txt # Dependencies
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="the-code-a-simple-but-powerful-cli">The Code: A Simple but Powerful CLI&lt;/h2>
&lt;p>I wanted a tool I could actually use—something with a clean command-line interface that handles the common cases elegantly. Working with Claude through PAI, we created a Python script that does exactly that:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">#!/usr/bin/env python3&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74">PDF to Markdown Converter
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74">A simple CLI tool to convert PDF files to Markdown using PyMuPDF4LLM
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> sys
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> os
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">from&lt;/span> pathlib &lt;span style="color:#f92672">import&lt;/span> Path
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> pymupdf4llm
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> pymupdf
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">from&lt;/span> tqdm &lt;span style="color:#f92672">import&lt;/span> tqdm
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">convert_pdf_to_markdown&lt;/span>(pdf_path: str, output_path: str &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span>) &lt;span style="color:#f92672">-&amp;gt;&lt;/span> str:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;Convert a PDF file to Markdown format.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#f92672">not&lt;/span> os&lt;span style="color:#f92672">.&lt;/span>path&lt;span style="color:#f92672">.&lt;/span>exists(pdf_path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">raise&lt;/span> &lt;span style="color:#a6e22e">FileNotFoundError&lt;/span>(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;PDF file not found: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>pdf_path&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># Get page count for progress bar&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> doc &lt;span style="color:#f92672">=&lt;/span> pymupdf&lt;span style="color:#f92672">.&lt;/span>open(pdf_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> page_count &lt;span style="color:#f92672">=&lt;/span> doc&lt;span style="color:#f92672">.&lt;/span>page_count
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> doc&lt;span style="color:#f92672">.&lt;/span>close()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;Converting: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>pdf_path&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">with&lt;/span> tqdm(total&lt;span style="color:#f92672">=&lt;/span>page_count, unit&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;page&amp;#34;&lt;/span>, desc&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;Processing&amp;#34;&lt;/span>, colour&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;blue&amp;#34;&lt;/span>) &lt;span style="color:#66d9ef">as&lt;/span> bar:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> md_text &lt;span style="color:#f92672">=&lt;/span> pymupdf4llm&lt;span style="color:#f92672">.&lt;/span>to_markdown(pdf_path, page_chunks&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">False&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> bar&lt;span style="color:#f92672">.&lt;/span>n &lt;span style="color:#f92672">=&lt;/span> page_count
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> bar&lt;span style="color:#f92672">.&lt;/span>refresh()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> output_path &lt;span style="color:#f92672">is&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_path &lt;span style="color:#f92672">=&lt;/span> Path(pdf_path)&lt;span style="color:#f92672">.&lt;/span>with_suffix(&lt;span style="color:#e6db74">&amp;#39;.md&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">with&lt;/span> open(output_path, &lt;span style="color:#e6db74">&amp;#39;w&amp;#39;&lt;/span>, encoding&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#39;utf-8&amp;#39;&lt;/span>) &lt;span style="color:#66d9ef">as&lt;/span> f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> f&lt;span style="color:#f92672">.&lt;/span>write(md_text)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;✓ Done: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>output_path&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74"> (&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>len(md_text)&lt;span style="color:#e6db74">:&lt;/span>&lt;span style="color:#e6db74">,&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74"> characters)&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> str(output_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">batch_convert&lt;/span>(input_dir: str, output_dir: str &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span>) &lt;span style="color:#f92672">-&amp;gt;&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;Convert all PDFs in a directory to Markdown.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> input_path &lt;span style="color:#f92672">=&lt;/span> Path(input_dir)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#f92672">not&lt;/span> input_path&lt;span style="color:#f92672">.&lt;/span>is_dir():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">raise&lt;/span> &lt;span style="color:#a6e22e">NotADirectoryError&lt;/span>(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;Not a directory: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>input_dir&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pdfs &lt;span style="color:#f92672">=&lt;/span> sorted(input_path&lt;span style="color:#f92672">.&lt;/span>glob(&lt;span style="color:#e6db74">&amp;#34;*.pdf&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#f92672">not&lt;/span> pdfs:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;No PDF files found in: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>input_dir&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sys&lt;span style="color:#f92672">.&lt;/span>exit(&lt;span style="color:#ae81ff">0&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> output_dir:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_dir &lt;span style="color:#f92672">=&lt;/span> Path(output_dir)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_dir &lt;span style="color:#f92672">=&lt;/span> input_path&lt;span style="color:#f92672">.&lt;/span>parent &lt;span style="color:#f92672">/&lt;/span> &lt;span style="color:#e6db74">&amp;#34;output&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_dir&lt;span style="color:#f92672">.&lt;/span>mkdir(parents&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">True&lt;/span>, exist_ok&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total &lt;span style="color:#f92672">=&lt;/span> len(pdfs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> succeeded &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> failed &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74">Batch mode: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>total&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74"> PDF(s) found in &amp;#39;&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>input_dir&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#39;&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;Output folder: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>output_dir&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">for&lt;/span> i, pdf_path &lt;span style="color:#f92672">in&lt;/span> enumerate(pdfs, start&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;[&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>i&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">/&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>total&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">] &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>pdf_path&lt;span style="color:#f92672">.&lt;/span>name&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_path &lt;span style="color:#f92672">=&lt;/span> output_dir &lt;span style="color:#f92672">/&lt;/span> pdf_path&lt;span style="color:#f92672">.&lt;/span>with_suffix(&lt;span style="color:#e6db74">&amp;#39;.md&amp;#39;&lt;/span>)&lt;span style="color:#f92672">.&lt;/span>name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> convert_pdf_to_markdown(str(pdf_path), str(output_path))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> succeeded &lt;span style="color:#f92672">+=&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">except&lt;/span> &lt;span style="color:#a6e22e">Exception&lt;/span> &lt;span style="color:#66d9ef">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34; ✗ Failed: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>e&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> failed &lt;span style="color:#f92672">+=&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34;─&amp;#34;&lt;/span> &lt;span style="color:#f92672">*&lt;/span> &lt;span style="color:#ae81ff">40&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;Batch complete: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>succeeded&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74"> converted, &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>failed&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74"> failed&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;Output folder: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>output_dir&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">main&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;Main CLI entry point&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args &lt;span style="color:#f92672">=&lt;/span> sys&lt;span style="color:#f92672">.&lt;/span>argv[&lt;span style="color:#ae81ff">1&lt;/span>:]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#f92672">not&lt;/span> args:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34;Usage:&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34; pdf2md &amp;lt;input.pdf&amp;gt; [output.md] # Convert a single PDF&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34; pdf2md --batch &amp;lt;folder/&amp;gt; # Convert all PDFs in a folder&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34; pdf2md --batch &amp;lt;folder/&amp;gt; --output &amp;lt;out_folder/&amp;gt; # Batch with custom output dir&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74">Examples:&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34; pdf2md document.pdf # Creates document.md&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34; pdf2md document.pdf custom.md # Creates custom.md&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34; pdf2md --batch input/ # Converts all PDFs in input/&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34; pdf2md --batch ~/documents/pdfs/ --output ~/knowledge-base/docs/&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sys&lt;span style="color:#f92672">.&lt;/span>exit(&lt;span style="color:#ae81ff">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> args[&lt;span style="color:#ae81ff">0&lt;/span>] &lt;span style="color:#f92672">==&lt;/span> &lt;span style="color:#e6db74">&amp;#34;--batch&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> input_dir &lt;span style="color:#f92672">=&lt;/span> args[&lt;span style="color:#ae81ff">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_dir &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#e6db74">&amp;#34;--output&amp;#34;&lt;/span> &lt;span style="color:#f92672">in&lt;/span> args:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> idx &lt;span style="color:#f92672">=&lt;/span> args&lt;span style="color:#f92672">.&lt;/span>index(&lt;span style="color:#e6db74">&amp;#34;--output&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_dir &lt;span style="color:#f92672">=&lt;/span> args[idx &lt;span style="color:#f92672">+&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> batch_convert(input_dir, output_dir)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pdf_path &lt;span style="color:#f92672">=&lt;/span> args[&lt;span style="color:#ae81ff">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_path &lt;span style="color:#f92672">=&lt;/span> args[&lt;span style="color:#ae81ff">1&lt;/span>] &lt;span style="color:#66d9ef">if&lt;/span> len(args) &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span> &lt;span style="color:#66d9ef">else&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> convert_pdf_to_markdown(pdf_path, output_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">if&lt;/span> __name__ &lt;span style="color:#f92672">==&lt;/span> &lt;span style="color:#e6db74">&amp;#34;__main__&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> main()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>What I love about this code:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Smart defaults:&lt;/strong> If you don&amp;rsquo;t specify an output path, it just replaces &lt;code>.pdf&lt;/code> with &lt;code>.md&lt;/code>&lt;/li>
&lt;li>&lt;strong>Progress bars:&lt;/strong> &lt;code>tqdm&lt;/code> gives you a blue progress bar with page count&lt;/li>
&lt;li>&lt;strong>Batch mode:&lt;/strong> &lt;code>--batch&lt;/code> processes an entire folder at once, with optional &lt;code>--output&lt;/code> target&lt;/li>
&lt;li>&lt;strong>Helpful errors:&lt;/strong> Clear messages when things go wrong&lt;/li>
&lt;li>&lt;strong>Flexible usage:&lt;/strong> Works with relative paths, absolute paths, custom output names&lt;/li>
&lt;/ul>
&lt;p>Make it executable:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>chmod +x pdf2md
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>And now it&amp;rsquo;s a proper command-line tool.&lt;/p>
&lt;hr>
&lt;h2 id="the-moment-of-truth-testing-with-real-data">The Moment of Truth: Testing with Real Data&lt;/h2>
&lt;p>Theory is great. But does it actually work?&lt;/p>
&lt;p>I grabbed that 1.3MB research paper on Generative Engine Optimization and ran:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>python pdf2md input/test.pdf output/test.md
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The output:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-text" data-lang="text">&lt;span style="display:flex;">&lt;span>Converting input/test.pdf to Markdown...
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>Processing: 100%|████████████████| 12/12 [00:02&amp;lt;00:00, 5.8 pages/s]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>✓ Done: output/test.md (73,463 characters)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>1.3MB PDF → 74KB of clean Markdown in seconds.&lt;/strong>&lt;/p>
&lt;p>I opened the output file, and there it was—perfectly formatted markdown:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-markdown" data-lang="markdown">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">## **GEO: Generative Engine Optimization**
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>Pranjal Aggarwal [∗]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>Indian Institute of Technology Delhi
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>New Delhi, India
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>pranjal2041@gmail.com
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>Ashwin Kalyan
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>Independent
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>Seattle, USA
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>asaavashwin@gmail.com
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Headers, formatting, structure—all preserved. No manual cleanup needed.&lt;/p>
&lt;p>Success.&lt;/p>
&lt;hr>
&lt;h2 id="what-this-unlocks">What This Unlocks&lt;/h2>
&lt;p>Now that I have PDFs converting to Markdown reliably, a whole world of possibilities opens up:&lt;/p>
&lt;h3 id="ai-workflows">AI Workflows&lt;/h3>
&lt;ul>
&lt;li>Feed research papers and documentation directly into Claude or other LLMs&lt;/li>
&lt;li>Build RAG (Retrieval Augmented Generation) pipelines backed by your document library&lt;/li>
&lt;li>Process technical documentation at scale without losing structure&lt;/li>
&lt;/ul>
&lt;h3 id="knowledge-management">Knowledge Management&lt;/h3>
&lt;ul>
&lt;li>Import PDFs into your Obsidian vault automatically&lt;/li>
&lt;li>Version control document content (because it&amp;rsquo;s now plain text in git)&lt;/li>
&lt;li>Full-text search across your entire converted document library&lt;/li>
&lt;/ul>
&lt;h3 id="automation-ideas">Automation Ideas&lt;/h3>
&lt;ul>
&lt;li>Watch folder that auto-converts any dropped PDFs&lt;/li>
&lt;li>Batch process entire directories of reports, papers, or manuals&lt;/li>
&lt;li>Feed converted markdown directly into a vector database&lt;/li>
&lt;li>API wrapper to convert PDFs via HTTP requests&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="lessons-learned-especially-for-cli-beginners">Lessons Learned (Especially for CLI Beginners)&lt;/h2>
&lt;h3 id="1-virtual-environments-are-non-negotiable">1. Virtual Environments Are Non-Negotiable&lt;/h3>
&lt;p>Every Python project should live in its own virtual environment. Always:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>python3 -m venv venv
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>source venv/bin/activate
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>pip install --upgrade pip
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This keeps dependencies isolated and projects reproducible.&lt;/p>
&lt;h3 id="2-bleeding-edge-isnt-always-better">2. Bleeding-Edge Isn&amp;rsquo;t Always Better&lt;/h3>
&lt;p>Python 3.14 is awesome, but sometimes mature tooling (like PyMuPDF) that &amp;ldquo;just works&amp;rdquo; beats bleeding-edge alternatives. Don&amp;rsquo;t be afraid to pivot when something doesn&amp;rsquo;t work.&lt;/p>
&lt;h3 id="3-test-with-real-data">3. Test With Real Data&lt;/h3>
&lt;p>I didn&amp;rsquo;t test with &amp;ldquo;hello.pdf&amp;rdquo; containing two sentences. I tested with a 1.3MB research paper. Real data reveals real issues (or in this case, confirms it works beautifully).&lt;/p>
&lt;h3 id="4-document-as-you-build">4. Document As You Build&lt;/h3>
&lt;p>Writing the README alongside the code made the project immediately understandable. Future-me will thank present-me.&lt;/p>
&lt;h3 id="5-claude-code--pai--superpowers">5. Claude Code + PAI = Superpowers&lt;/h3>
&lt;p>Working with Claude through the PAI infrastructure meant I had a senior developer helping me think through:&lt;/p>
&lt;ul>
&lt;li>Project structure (first principles)&lt;/li>
&lt;li>Library selection (when to pivot)&lt;/li>
&lt;li>Code organization (clean, maintainable)&lt;/li>
&lt;li>Real-world usage patterns&lt;/li>
&lt;/ul>
&lt;p>This wasn&amp;rsquo;t just coding faster—it was learning better patterns while building.&lt;/p>
&lt;hr>
&lt;h2 id="usage-examples">Usage Examples&lt;/h2>
&lt;h3 id="basic-conversion">Basic Conversion&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Activate environment first (always!)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>source venv/bin/activate
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Convert a PDF&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>python pdf2md document.pdf
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Custom output name&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>python pdf2md research.pdf my-notes.md
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Full paths&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>python pdf2md ~/Downloads/paper.pdf ~/Documents/notes.md
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="batch-processing">Batch Processing&lt;/h3>
&lt;p>Convert an entire folder of PDFs:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>source venv/bin/activate
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Convert all PDFs in a folder (output goes to output/ by default)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>python pdf2md --batch ~/documents/pdfs/
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Convert to a specific knowledge base directory&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>python pdf2md --batch ~/documents/pdfs/ --output ~/knowledge-base/docs/
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="add-to-path-optional">Add to PATH (Optional)&lt;/h3>
&lt;p>To use &lt;code>pdf2md&lt;/code> from anywhere:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Add to ~/.zshrc&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>export PATH&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;/Users/dsa/projects/pdf-to-markdown:&lt;/span>$PATH&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Then run from anywhere&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>pdf2md ~/Downloads/paper.pdf ~/Documents/paper.md
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;hr>
&lt;h2 id="whats-next">What&amp;rsquo;s Next?&lt;/h2>
&lt;p>This tool works great as-is, but there are some exciting enhancements on the roadmap:&lt;/p>
&lt;h3 id="immediate-improvements">Immediate Improvements&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Better layout analysis:&lt;/strong> Install &lt;code>pymupdf_layout&lt;/code> for improved structure detection on complex documents&lt;/li>
&lt;li>&lt;strong>Recursive batch mode:&lt;/strong> Process nested folder structures, not just flat directories&lt;/li>
&lt;/ul>
&lt;h3 id="future-integrations">Future Integrations&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>RAG pipeline:&lt;/strong> Auto-feed converted markdown into a vector database&lt;/li>
&lt;li>&lt;strong>Obsidian plugin:&lt;/strong> Detect PDFs in vault and convert automatically&lt;/li>
&lt;li>&lt;strong>FastAPI wrapper:&lt;/strong> Create an HTTP API for web apps to use&lt;/li>
&lt;li>&lt;strong>Electron/Tauri app:&lt;/strong> Build a desktop GUI for non-technical users&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="the-bigger-picture-why-this-matters">The Bigger Picture: Why This Matters&lt;/h2>
&lt;p>This project is tiny—roughly 100 lines of Python, 30 minutes of work. But it represents something bigger:&lt;/p>
&lt;p>&lt;strong>The ability to build tools that solve your actual problems.&lt;/strong>&lt;/p>
&lt;p>I had a workflow friction (PDFs don&amp;rsquo;t work well with AI tools). I built a solution. Now that friction is gone, and I can focus on higher-level work.&lt;/p>
&lt;p>And the data is clear: converting your document library to Markdown isn&amp;rsquo;t a nice-to-have. It&amp;rsquo;s a multiplier on every AI workflow that follows. Up to 70% fewer tokens consumed. 84% fewer retrieval failures. 50% fewer incorrect answers. These aren&amp;rsquo;t marginal improvements—they&amp;rsquo;re transformational.&lt;/p>
&lt;p>Working with Claude Code through PAI accelerated all of this. It&amp;rsquo;s like having a patient senior developer sitting next to you, suggesting better approaches, catching errors before they happen, and explaining &lt;em>why&lt;/em> certain patterns work.&lt;/p>
&lt;hr>
&lt;h2 id="resources">Resources&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>PyMuPDF4LLM Docs:&lt;/strong> &lt;a href="https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/" target="_blank" rel="noopener noreferrer">https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/&lt;/a>
&lt;/li>
&lt;li>&lt;strong>PyMuPDF GitHub:&lt;/strong> &lt;a href="https://github.com/pymupdf/PyMuPDF" target="_blank" rel="noopener noreferrer">https://github.com/pymupdf/PyMuPDF&lt;/a>
&lt;/li>
&lt;/ul>
&lt;h3 id="citations-markdown-vs-pdf-for-llms">Citations: Markdown vs PDF for LLMs&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Why PDFs Fail Under LLM Parsing&lt;/strong> — Steven Howard, Untethered AI: &lt;a href="https://untetheredai.substack.com/p/why-pdfs-fail-under-llm-parsing" target="_blank" rel="noopener noreferrer">https://untetheredai.substack.com/p/why-pdfs-fail-under-llm-parsing&lt;/a>
&lt;/li>
&lt;li>&lt;strong>PDF vs Markdown for AI: Token Efficiency&lt;/strong> — MarkdownConverters: &lt;a href="https://markdownconverters.com/blog/pdf-vs-markdown-ai-tokens" target="_blank" rel="noopener noreferrer">https://markdownconverters.com/blog/pdf-vs-markdown-ai-tokens&lt;/a>
&lt;/li>
&lt;li>&lt;strong>Revolutionizing RAG with Enhanced PDF Structure Recognition&lt;/strong> — arXiv:2401.12599 (2024): &lt;a href="https://arxiv.org/abs/2401.12599" target="_blank" rel="noopener noreferrer">https://arxiv.org/abs/2401.12599&lt;/a>
&lt;/li>
&lt;li>&lt;strong>Approaches to PDF Data Extraction for Information Retrieval&lt;/strong> — NVIDIA Technical Blog: &lt;a href="https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/" target="_blank" rel="noopener noreferrer">https://developer.nvidia.com/blog/approaches-to-pdf-data-extraction-for-information-retrieval/&lt;/a>
&lt;/li>
&lt;li>&lt;strong>Improved RAG Document Processing With Markdown&lt;/strong> — Dr. Leon Eversberg, Towards Data Science: &lt;a href="https://medium.com/data-science/improved-rag-document-processing-with-markdown-426a2e0dd82b" target="_blank" rel="noopener noreferrer">https://medium.com/data-science/improved-rag-document-processing-with-markdown-426a2e0dd82b&lt;/a>
&lt;/li>
&lt;li>&lt;strong>Contextual Chunking: Boost Your RAG Retrieval Accuracy&lt;/strong> — Unstructured.io: &lt;a href="https://unstructured.io/blog/contextual-chunking-in-unstructured-platform-boost-your-rag-retrieval-accuracy" target="_blank" rel="noopener noreferrer">https://unstructured.io/blog/contextual-chunking-in-unstructured-platform-boost-your-rag-retrieval-accuracy&lt;/a>
&lt;/li>
&lt;li>&lt;strong>Boosting AI Performance: The Power of LLM-Friendly Content in Markdown&lt;/strong> — Webex Developers Blog: &lt;a href="https://developer.webex.com/blog/boosting-ai-performance-the-power-of-llm-friendly-content-in-markdown" target="_blank" rel="noopener noreferrer">https://developer.webex.com/blog/boosting-ai-performance-the-power-of-llm-friendly-content-in-markdown&lt;/a>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;p>&lt;strong>Happy converting!&lt;/strong>&lt;/p></content></item><item><title>Deploying a Hugo Site to Namecheap with PAI</title><link>https://augmentedresilience.com/posts/augmented-resilience-posts/deploying-a-hugo-site-to-namecheap-with-pai/</link><pubDate>Sun, 15 Feb 2026 00:00:00 +0000</pubDate><guid>https://augmentedresilience.com/posts/augmented-resilience-posts/deploying-a-hugo-site-to-namecheap-with-pai/</guid><description>&lt;p>I recently deployed my Hugo blog to Namecheap shared hosting, using Obsidian as my content editor and Claude Code with PAI (Personal AI) as my copilot. Here&amp;rsquo;s a walkthrough of every step, from fixing build errors to setting up a fully automated pipeline that goes from Obsidian to live site in a single command.&lt;/p>
&lt;h2 id="the-starting-point">The Starting Point&lt;/h2>
&lt;p>I created a Hugo blog project called &lt;strong>Augmented Resilience&lt;/strong> and used the &lt;a href="https://github.com/mirus-ua/hugo-theme-re-terminal" target="_blank" rel="noopener noreferrer">re-terminal&lt;/a>
theme, a Namecheap shared hosting account, and a GitHub repository. I used Claude Code in VS Code editor and leveraged Daniel Miessler&amp;rsquo;s Personal AI infrastructure. The goal: get the site live at &lt;a href="https://augmentedresilience.com" target="_blank" rel="noopener noreferrer">augmentedresilience.com&lt;/a>
with a push-to-deploy workflow.&lt;/p></description><content>&lt;p>I recently deployed my Hugo blog to Namecheap shared hosting, using Obsidian as my content editor and Claude Code with PAI (Personal AI) as my copilot. Here&amp;rsquo;s a walkthrough of every step, from fixing build errors to setting up a fully automated pipeline that goes from Obsidian to live site in a single command.&lt;/p>
&lt;h2 id="the-starting-point">The Starting Point&lt;/h2>
&lt;p>I created a Hugo blog project called &lt;strong>Augmented Resilience&lt;/strong> and used the &lt;a href="https://github.com/mirus-ua/hugo-theme-re-terminal" target="_blank" rel="noopener noreferrer">re-terminal&lt;/a>
theme, a Namecheap shared hosting account, and a GitHub repository. I used Claude Code in VS Code editor and leveraged Daniel Miessler&amp;rsquo;s Personal AI infrastructure. The goal: get the site live at &lt;a href="https://augmentedresilience.com" target="_blank" rel="noopener noreferrer">augmentedresilience.com&lt;/a>
with a push-to-deploy workflow.&lt;/p>
&lt;p>For context, the Personal AI Infrastructure System (PAI) from Daniel Miessler (see resources below) is an open-source framework that wraps around Claude Code and turns it into a structured problem-solving system. Instead of just chatting with an AI, PAI runs every request through a 7-phase algorithm — observe, think, plan, build, execute, verify, learn — so nothing gets skipped. It maintains persistent memory across sessions (so it remembers my project structure, preferences, and past decisions), automatically selects specialized agents for different tasks (security review, architecture, engineering), and enforces verification criteria before declaring anything &amp;ldquo;done.&amp;rdquo; For this project, PAI handled everything from debugging Hugo build errors to writing the deploy script to catching sensitive data I accidentally left in this blog post before it went live. It wasn&amp;rsquo;t just an AI assistant — it was the entire workflow engine. I found it easier to use it within VS Code (&lt;em>still getting used to using the command line interface&lt;/em>).&lt;/p>
&lt;h2 id="step-1-fixing-the-hugo-build">Step 1: Fixing the Hugo Build&lt;/h2>
&lt;p>The first issue was a build error:&lt;/p>
&lt;pre tabindex="0">&lt;code>module &amp;#34;hugo-theme-re-terminal&amp;#34; not found
&lt;/code>&lt;/pre>&lt;p>The problem was a mismatch between the theme name in &lt;code>hugo.toml&lt;/code> and the actual directory name. The theme was installed as a git submodule at &lt;code>themes/re-terminal/&lt;/code>, but the config referenced &lt;code>hugo-theme-re-terminal&lt;/code>.&lt;/p>
&lt;p>&lt;strong>Fix:&lt;/strong> Change the theme name in &lt;code>hugo.toml&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-toml" data-lang="toml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a6e22e">theme&lt;/span> = &lt;span style="color:#e6db74">&amp;#34;re-terminal&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>After that, &lt;code>hugo&lt;/code> built the site successfully, generating the &lt;code>public/&lt;/code> folder with all the static files.&lt;/p>
&lt;h2 id="step-2-setting-up-the-github-repository">Step 2: Setting Up the GitHub Repository&lt;/h2>
&lt;p>I initialized the repo and connected it to GitHub:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>git init
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>git remote add origin git@github.com:dsacosta/Augmented-Resilience.git
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>git add .
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>git commit -m &lt;span style="color:#e6db74">&amp;#34;my first commit&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>git push origin main
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>One gotcha: I initially typed &lt;code>orgin&lt;/code> instead of &lt;code>origin&lt;/code> in the remote add command. Typos happen — double-check your remote names with &lt;code>git remote -v&lt;/code>.&lt;/p>
&lt;h2 id="step-3-connecting-namecheap-to-github-via-ssh">Step 3: Connecting Namecheap to GitHub via SSH&lt;/h2>
&lt;p>This was the trickiest part. Namecheap shared hosting needs an SSH key to clone from a private GitHub repo. Here&amp;rsquo;s what worked:&lt;/p>
&lt;h3 id="generate-an-ssh-key-on-namecheap">Generate an SSH Key on Namecheap&lt;/h3>
&lt;ol>
&lt;li>Log into &lt;strong>cPanel&lt;/strong> on Namecheap&lt;/li>
&lt;li>Go to &lt;strong>SSH Access&lt;/strong> → &lt;strong>Manage SSH Keys&lt;/strong> → &lt;strong>Generate a New Key&lt;/strong>&lt;/li>
&lt;li>Generate an RSA key (I used the default settings)&lt;/li>
&lt;/ol>
&lt;h3 id="remove-the-passphrase">Remove the Passphrase&lt;/h3>
&lt;p>This is critical. cPanel&amp;rsquo;s Git Version Control runs non-interactively, so it can&amp;rsquo;t prompt for a passphrase. I opened &lt;strong>cPanel Terminal&lt;/strong> and ran:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>ssh-keygen -p -f ~/.ssh/id_rsa
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Enter the old passphrase, then press Enter twice for no new passphrase.&lt;/p>
&lt;h3 id="add-the-public-key-to-github">Add the Public Key to GitHub&lt;/h3>
&lt;ol>
&lt;li>On Namecheap&amp;rsquo;s cPanel Terminal, run: &lt;code>cat ~/.ssh/id_rsa.pub&lt;/code>&lt;/li>
&lt;li>Copy the output&lt;/li>
&lt;li>Go to your GitHub repo → &lt;strong>Settings&lt;/strong> → &lt;strong>Deploy Keys&lt;/strong> → &lt;strong>Add deploy key&lt;/strong>&lt;/li>
&lt;li>Paste the public key and save&lt;/li>
&lt;/ol>
&lt;h3 id="verify-the-connection">Verify the Connection&lt;/h3>
&lt;p>From cPanel Terminal:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>ssh -T git@github.com
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>You should see: &lt;code>Hi dsacosta/Augmented-Resilience! You've successfully authenticated...&lt;/code>&lt;/p>
&lt;h2 id="step-4-clone-the-repo-on-namecheap">Step 4: Clone the Repo on Namecheap&lt;/h2>
&lt;ol>
&lt;li>In cPanel, go to &lt;strong>Git Version Control&lt;/strong> → &lt;strong>Create&lt;/strong>&lt;/li>
&lt;li>Toggle &lt;strong>Clone a Repository&lt;/strong> on&lt;/li>
&lt;li>Enter the clone URL: &lt;code>git@github.com:dsacosta/Augmented-Resilience.git&lt;/code>&lt;/li>
&lt;li>Set the repository path (I used &lt;code>/home/yourusername/your-repo&lt;/code>)&lt;/li>
&lt;li>Click &lt;strong>Create&lt;/strong>&lt;/li>
&lt;/ol>
&lt;p>Important: Don&amp;rsquo;t clone directly into &lt;code>public_html&lt;/code> or your domain folder — it likely already has files and will error out. Clone to a separate directory and use deployment to copy files over.&lt;/p>
&lt;h2 id="step-5-auto-deployment-with-cpanelyml">Step 5: Auto-Deployment with .cpanel.yml&lt;/h2>
&lt;p>cPanel supports automatic deployment tasks via a &lt;code>.cpanel.yml&lt;/code> file in the repo root. This file tells cPanel what to do after each pull:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>---
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">deployment&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">tasks&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#ae81ff">export DEPLOYPATH=/home/yourusername/yourdomain.com/&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#ae81ff">/bin/cp -R public/* $DEPLOYPATH&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This copies everything from the &lt;code>public/&lt;/code> folder (Hugo&amp;rsquo;s build output) into the live site directory.&lt;/p>
&lt;p>After pushing this file to GitHub:&lt;/p>
&lt;ol>
&lt;li>Go to &lt;strong>Git Version Control&lt;/strong> → &lt;strong>Manage&lt;/strong> your repo&lt;/li>
&lt;li>Click the &lt;strong>Pull or Deploy&lt;/strong> tab&lt;/li>
&lt;li>Click &lt;strong>Update from Remote&lt;/strong> to pull the latest&lt;/li>
&lt;li>Click &lt;strong>Deploy HEAD Commit&lt;/strong> to trigger the &lt;code>.cpanel.yml&lt;/code> tasks&lt;/li>
&lt;/ol>
&lt;p>Your site should now be live.&lt;/p>
&lt;h2 id="step-6-fully-automated-deploys-with-github-actions">Step 6: Fully Automated Deploys with GitHub Actions&lt;/h2>
&lt;p>To eliminate the manual &amp;ldquo;pull and deploy&amp;rdquo; step in cPanel, I set up a GitHub Actions workflow that SSHs into Namecheap and triggers the pull automatically on every push.&lt;/p>
&lt;h3 id="generate-a-deploy-key">Generate a Deploy Key&lt;/h3>
&lt;p>On your local machine, generate a key pair with no passphrase:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>ssh-keygen -t ed25519 -C &lt;span style="color:#e6db74">&amp;#34;github-actions-deploy&amp;#34;&lt;/span> -f ~/.ssh/deploy_key -N &lt;span style="color:#e6db74">&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Add the &lt;strong>public key&lt;/strong> to Namecheap:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># In cPanel Terminal on Namecheap:&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>echo &lt;span style="color:#e6db74">&amp;#34;ssh-ed25519 AAAA...your-key-here github-actions-deploy&amp;#34;&lt;/span> &amp;gt;&amp;gt; ~/.ssh/authorized_keys
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="add-secrets-to-github">Add Secrets to GitHub&lt;/h3>
&lt;p>Go to your repo → &lt;strong>Settings&lt;/strong> → &lt;strong>Secrets and variables&lt;/strong> → &lt;strong>Actions&lt;/strong> and add:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Secret&lt;/th>
&lt;th>Value&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;code>NC_HOST&lt;/code>&lt;/td>
&lt;td>&lt;code>augmentedresilience.com&lt;/code>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>NC_USER&lt;/code>&lt;/td>
&lt;td>Your cPanel username&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>NC_PORT&lt;/code>&lt;/td>
&lt;td>Your SSH port (check cPanel)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;code>NC_SSH_KEY&lt;/code>&lt;/td>
&lt;td>The full private key (including BEGIN/END lines)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="create-the-workflow">Create the Workflow&lt;/h3>
&lt;p>Add &lt;code>.github/workflows/deploy.yml&lt;/code> to your repo:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">Deploy to Namecheap&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">on&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">push&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">branches&lt;/span>: [&lt;span style="color:#ae81ff">main]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">jobs&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">deploy&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">runs-on&lt;/span>: &lt;span style="color:#ae81ff">ubuntu-latest&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">steps&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#ae81ff">Deploy via SSH&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">uses&lt;/span>: &lt;span style="color:#ae81ff">appleboy/ssh-action@v1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">with&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">host&lt;/span>: &lt;span style="color:#ae81ff">${{ secrets.NC_HOST }}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">username&lt;/span>: &lt;span style="color:#ae81ff">${{ secrets.NC_USER }}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">key&lt;/span>: &lt;span style="color:#ae81ff">${{ secrets.NC_SSH_KEY }}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">port&lt;/span>: &lt;span style="color:#ae81ff">${{ secrets.NC_PORT }}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">script&lt;/span>: |&lt;span style="color:#e6db74">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74"> cd ~/your-repo &amp;amp;&amp;amp; git pull origin main &amp;amp;&amp;amp; /bin/cp -R public/* ~/yourdomain.com/&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now every push to &lt;code>main&lt;/code> automatically deploys to your live site.&lt;/p>
&lt;h2 id="step-7-one-command-deploy-script">Step 7: One-Command Deploy Script&lt;/h2>
&lt;p>Five manual commands every time you publish? That&amp;rsquo;s not a workflow — that&amp;rsquo;s a chore. I had Claude write a Python script that handles everything in one shot:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">#!/usr/bin/env python3&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;One-command deploy: Obsidian → Hugo → GitHub → Live site.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> subprocess
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">import&lt;/span> sys
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">from&lt;/span> datetime &lt;span style="color:#f92672">import&lt;/span> datetime
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>PROJECT_DIR &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#34;~/Documents/Augmented-Resilience&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>OBSIDIAN_POSTS &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#34;~/projects/obsidian-vault/30-projects/augmented-resilience-posts&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>HUGO_POSTS &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>PROJECT_DIR&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">/content/posts&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">run&lt;/span>(cmd, description, cwd&lt;span style="color:#f92672">=&lt;/span>PROJECT_DIR):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;Run a command and print status.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>&lt;span style="color:#e6db74">&amp;#39;=&amp;#39;&lt;/span>&lt;span style="color:#f92672">*&lt;/span>&lt;span style="color:#ae81ff">50&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34; &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>description&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>&lt;span style="color:#e6db74">&amp;#39;=&amp;#39;&lt;/span>&lt;span style="color:#f92672">*&lt;/span>&lt;span style="color:#ae81ff">50&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result &lt;span style="color:#f92672">=&lt;/span> subprocess&lt;span style="color:#f92672">.&lt;/span>run(cmd, shell&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">True&lt;/span>, cwd&lt;span style="color:#f92672">=&lt;/span>cwd)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> result&lt;span style="color:#f92672">.&lt;/span>returncode &lt;span style="color:#f92672">!=&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74"> FAILED: &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>description&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sys&lt;span style="color:#f92672">.&lt;/span>exit(&lt;span style="color:#ae81ff">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> result
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">main&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> msg &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#e6db74">&amp;#34; &amp;#34;&lt;/span>&lt;span style="color:#f92672">.&lt;/span>join(sys&lt;span style="color:#f92672">.&lt;/span>argv[&lt;span style="color:#ae81ff">1&lt;/span>:]) &lt;span style="color:#66d9ef">if&lt;/span> len(sys&lt;span style="color:#f92672">.&lt;/span>argv) &lt;span style="color:#f92672">&amp;gt;&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span> &lt;span style="color:#66d9ef">else&lt;/span> &lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;Site update &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>datetime&lt;span style="color:#f92672">.&lt;/span>now()&lt;span style="color:#f92672">.&lt;/span>strftime(&lt;span style="color:#e6db74">&amp;#39;%Y-%m-&lt;/span>&lt;span style="color:#e6db74">%d&lt;/span>&lt;span style="color:#e6db74"> %H:%M&amp;#39;&lt;/span>)&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> run(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#39;rsync -av --delete &amp;#34;&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>OBSIDIAN_POSTS&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34; &amp;#34;&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>HUGO_POSTS&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&amp;#39;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Syncing posts from Obsidian&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> run(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;python3 &lt;/span>&lt;span style="color:#e6db74">{&lt;/span>PROJECT_DIR&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">/images.py&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Processing images&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> run(&lt;span style="color:#e6db74">&amp;#34;hugo&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Building site with Hugo&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> run(&lt;span style="color:#e6db74">&amp;#34;git add .&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Staging changes&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result &lt;span style="color:#f92672">=&lt;/span> subprocess&lt;span style="color:#f92672">.&lt;/span>run(&lt;span style="color:#e6db74">&amp;#34;git diff --cached --quiet&amp;#34;&lt;/span>, shell&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">True&lt;/span>, cwd&lt;span style="color:#f92672">=&lt;/span>PROJECT_DIR)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> result&lt;span style="color:#f92672">.&lt;/span>returncode &lt;span style="color:#f92672">==&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74"> No changes to commit. Site is up to date.&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> run(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#39;git commit -m &amp;#34;&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>msg&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&amp;#39;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Committing&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> run(&lt;span style="color:#e6db74">&amp;#34;git push origin main&amp;#34;&lt;/span>, &lt;span style="color:#e6db74">&amp;#34;Pushing to GitHub&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>&lt;span style="color:#e6db74">&amp;#39;=&amp;#39;&lt;/span>&lt;span style="color:#f92672">*&lt;/span>&lt;span style="color:#ae81ff">50&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">&amp;#34; DEPLOYED! Your site will be live shortly.&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#e6db74">f&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">{&lt;/span>&lt;span style="color:#e6db74">&amp;#39;=&amp;#39;&lt;/span>&lt;span style="color:#f92672">*&lt;/span>&lt;span style="color:#ae81ff">50&lt;/span>&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#ae81ff">\n&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">if&lt;/span> __name__ &lt;span style="color:#f92672">==&lt;/span> &lt;span style="color:#e6db74">&amp;#34;__main__&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> main()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Save this as &lt;code>deploy.py&lt;/code> in your project root. Now the entire workflow is:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Default timestamped commit message&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>python3 deploy.py
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Or with a custom message&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>python3 deploy.py &lt;span style="color:#e6db74">&amp;#34;Add new blog post about deployment&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The script runs every step in sequence — syncs from Obsidian, converts image links, builds with Hugo, commits, and pushes. If any step fails, it stops immediately so you don&amp;rsquo;t push a broken build. Combined with the GitHub Actions workflow from Step 6, pushing triggers the auto-deploy to Namecheap. One command, fully live.&lt;/p>
&lt;h2 id="lessons-learned">Lessons Learned&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>SSH passphrases break cPanel automation.&lt;/strong> Always remove the passphrase from keys used by cPanel&amp;rsquo;s Git Version Control.&lt;/li>
&lt;li>&lt;strong>Theme names must match directory names.&lt;/strong> Hugo looks for the theme in &lt;code>themes/&amp;lt;theme-name&amp;gt;/&lt;/code>, so the &lt;code>theme&lt;/code> value in your config must match exactly.&lt;/li>
&lt;li>&lt;strong>Don&amp;rsquo;t clone into the live site directory.&lt;/strong> Clone to a separate folder and use &lt;code>.cpanel.yml&lt;/code> to copy the built files over.&lt;/li>
&lt;li>&lt;strong>GitHub Actions + SSH is the cleanest auto-deploy for shared hosting.&lt;/strong> No webhooks, no cron jobs — just a simple SSH action that runs on every push.&lt;/li>
&lt;li>&lt;strong>Claude Code with PAI made this possible in a single session.&lt;/strong> From debugging build errors to SSH key troubleshooting to writing GitHub Actions workflows, having an AI pair programmer turned what could have been hours of Stack Overflow rabbit holes into a smooth, guided process.&lt;/li>
&lt;/ul>
&lt;h2 id="tools-used">Tools Used&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://gohugo.io/" target="_blank" rel="noopener noreferrer">Hugo&lt;/a>
— Static site generator&lt;/li>
&lt;li>&lt;a href="https://github.com/mirus-ua/hugo-theme-re-terminal" target="_blank" rel="noopener noreferrer">re-terminal theme&lt;/a>
— Hugo theme&lt;/li>
&lt;li>&lt;a href="https://www.namecheap.com/" target="_blank" rel="noopener noreferrer">Namecheap&lt;/a>
— Shared hosting with cPanel&lt;/li>
&lt;li>&lt;a href="https://github.com/features/actions" target="_blank" rel="noopener noreferrer">GitHub Actions&lt;/a>
— CI/CD automation&lt;/li>
&lt;li>&lt;a href="https://claude.ai/" target="_blank" rel="noopener noreferrer">Claude Code&lt;/a>
— AI pair programmer&lt;/li>
&lt;li>&lt;a href="https://danielmiessler.com/blog/personal-ai-infrastructure" target="_blank" rel="noopener noreferrer">Personal AI Infrastructure (PAI)&lt;/a>
— Workflow engine&lt;/li>
&lt;li>&lt;a href="https://obsidian.md/" target="_blank" rel="noopener noreferrer">Obsidian&lt;/a>
— Content authoring&lt;/li>
&lt;li>&lt;a href="https://code.visualstudio.com/" target="_blank" rel="noopener noreferrer">VS Code&lt;/a>
- Development environment; used as the workspace for Claude Code sessions&lt;/li>
&lt;/ul>
&lt;h2 id="resources">Resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://youtu.be/dnE7c0ELEH8?si=YEDoaoiekMYJIe3U" target="_blank" rel="noopener noreferrer">I started a blog&amp;hellip;..in 2024 (why you should too)&lt;/a>
— The YouTube video that inspired me to start this blog&lt;/li>
&lt;li>&lt;a href="https://danielmiessler.com/blog/personal-ai-infrastructure" target="_blank" rel="noopener noreferrer">Building a Personal AI Infrastructure (PAI)&lt;/a>
— Daniel Miessler&amp;rsquo;s guide to building your own Personal AI system&lt;/li>
&lt;/ul></content></item></channel></rss>