Abstract data visualisation in dark blue

ai-agentsdocumentationiacinfrastructure EN

From tutorial docs to agent-readable docs

2026-05-03

The previous post said the rules don’t matter. The information does. Then I left the rest as a hand-wave: “I tore down my IaC repo and rebuilt it as agent-readable docs.”

That hand-wave covered weeks of work. This post is the work.

What “tutorial-style docs” actually were

The old IaC repo had a long install guide at the top. A representative slice (cleaned up):

## Step 1: Install Debian
Run through the Debian installer, selecting the Standard and SSH Package.
Set up a user — remember that name as `$user` for later.

## Step 2: Install and configure basics
Change to root:

    su root

Update packages and install sudo:

    apt-get update | apt-get upgrade
    apt-get install sudo
    /sbin/adduser $user sudo

After your system has rebooted you should be able to ssh in:

    ssh $user@your-machine-name

If you want, set up passwordless sudo by editing /etc/sudoers …

Useful for a human onboarding to homelab admin from zero. Useless for an AI agent. The agent already knows how to install Debian. What it doesn’t know is what this specific server is for, what’s already running on it, what depends on it, what’s been tried before. None of that is in the install guide.

It was also useless for me, six months later, coming back to a server I’d forgotten the details of. I’d skim the tutorial, find no answers to “what’s actually on this box”, and end up SSH-ing in to grep config files like a stranger.

What “agent-readable docs” actually look like

Each server now has a per-server readme. Same shape every time:

# Server: homelab-ai
environment: AI workloads — local LLMs, scheduled inference jobs
hostname: homelab-ai
ip: 192.168.10.42

### Services available
| Service | Description | Port |
|---|---|---|
| Proxmox | Management UI | 8006 |
| Netdata | Monitoring agent (streams to homelab-monitor) | 19999 |

### Hardware
- cpu: Intel Core Ultra 7, 20 cores / 28 threads
- ram: 128GB DDR5
- gpu: 2x consumer GPUs, 16GB VRAM each
- disk: 1.8TB NVMe (VM storage)
- network: 2.5GbE
- ipmi: 192.168.10.99

### Storage
- data-fast: 800GB LVM-thin (VG: pve, nvme1n1p3)
- data-bulk: 1.8TB LVM-thin (VG: data-bulk, nvme0n1)

### NFS mounts
- nas-iso: 192.168.10.13:/volume1/iso
- nas-backup: 192.168.10.13:/volume2/backup

Plus an issues section with timestamped entries:

## Issues & lessons

### 2026-01-15: Fan controller requires Windows VM (resolved)
**Problem:** USB fan controller needs Windows software to enable PWM mode.
**Tried:** Linux USB control tools (don't support this device ID), IPMI raw commands (BMC-bypassed by the controller).
**Current:** 24/7 Windows VM, 25% CPU quota, USB passthrough. Works. Cosmetic only — no display, just the background service.

No “Step 1, Step 2, click Save”. No introductory paragraphs explaining what Proxmox is. The reader is assumed technical — human or agent — and the doc cuts straight to the part only this server can tell them.

Two principles, in real markdown

The principles I mentioned in post 2 fall out of this shape concretely:

Don’t store what can be generated. No apt list installed output. No directory trees. No package version dumps. If ssh in && run a command will tell you the truth, don’t write down the truth — keep the doc small enough to stay accurate.

Write for an engineer. No we leverage modern container orchestration to deliver scalable workloads. No Docker is a tool that allows you to…. If the reader doesn’t know what Docker is, they’re not the audience. The doc gives the reader the part they couldn’t have inferred — what’s running here, why, with what dependencies.

What this changes when an agent picks up a task

Before, the agent ran into a Step 1 and started executing. That worked when the situation matched the tutorial; it failed silently when the situation diverged.

Now, the agent picks up the per-server readme first. Reads what the server is, what’s running, what’s broken, what’s been tried. Then — and only then — does it form a plan for the task it was actually given.

The mental model: the agent walks into a new room, and the room introduces itself.

A short prompt + a good per-server readme is the entire context the agent needs for most operational tasks. No rules. No long system prompt. Just the room introducing itself.

Three habits I had to relearn

Looking back, the rewrite changed how I write, not just what I write:

Timestamps everywhere. ### 2026-01-15: … on issue entries, lessons, decisions. Agents can rank “what’s recent” without me telling them. Stale entries can be archived without being lost.
Lessons inline with the system they apply to. Not in a separate “best practices” doc. The lesson and the system live together — the lesson is about this thing, so it’s next to this thing.
Delete more. If a section can be regenerated from the running system, it goes. Smaller docs that stay true beat bigger docs that drift.

The rebuild took weeks. The first agent test on the new docs took seconds — and the agent did what I expected, which was novel enough to feel suspicious. I checked the work twice. It was right.

The next post: how the rebuild proved itself when the next disaster hit — about a week later, before I’d finished setting up real backups.

Part of The 2026 Rebuild.