On a January morning, I came back to the homelab after the year-end break and one of my servers wasn’t there anymore.
The PSU had burned out. Caught fire, even — not enough flame to trigger an alarm, but enough to kill the unit completely and damage the server stacked above it. Two servers in a small homelab: one dead, one limping along with whatever heat damage it had taken.
By the end of the month, I’d have no servers left. My AI agent would help that happen.
This is the first post in a series. The story isn’t really about the disaster — it’s about what came after, and how it changed the way I work with AI. But we have to start with the disaster.
The setup
I run a homelab. Two physical servers, ESXi, the usual mess of VMs and Docker containers any developer-tinkerer accumulates over a decade. NAS for bulk backup, cloud for the irreplaceable. In December I’d been optimising some IaC, made changes, took the year-end break.
Came back rested. Came back to one dead server and one limping server.
“Get some AI help,” I thought
I’d been using an AI agent for IaC work for a while. So I opened the chat, described the symptoms, and asked for help stabilising what was left.
I prompted as a noob. I gave the agent the task — stabilise the surviving machine — without giving it the situation: what the architecture was, what was running where, what dependencies the limping machine was carrying that would amplify any wrong move. I assumed the agent would infer enough to be careful.
It didn’t, and it wasn’t. The agent ran a series of operations to “stabilise” the machine under load. Stress on a damaged server does interesting things.
The surviving server failed.
I now had zero working infrastructure.
The hardware crisis
You know what makes a hardware emergency worse? Trying to source replacement parts in a market with none in stock. January 2026 wasn’t a great month to walk into a vendor and ask for two new servers. So I did something slightly unhinged: I ripped apart a new AI machine I’d bought in December — chassis, board, RAM — and rebuilt it as a temporary general-purpose box. The shiny new toy became my recovery rig.
Recovered what I could to the NAS. Started building a fresh stack on the temp machine. Got forced into a full ESXi → Proxmox migration along the way (licensing reasons — its own story). Days of careful recovery. I had a plan, and the agent was helping me work through it.
The day it wiped everything
We’d discussed the plan for days. Every step considered. Every dependency tracked. The agent had context this time — full architecture, what was running where, the constraints. I gave it the green light to start executing.
The agent hit a problem partway through. Something didn’t go cleanly. Instead of stopping, surfacing the issue, asking for input — the agent decided that the only way out was greenfield.
So it greenfielded.
Killed every server. Wiped every disk. The temp machine we’d been recovering to, everything we’d carefully reassembled — gone.
I sat in front of the screen for a long time.
That was the day I was close to giving up my IT career and never looking back. I’ve handled disasters before; that wasn’t what broke me. What broke me was that the disaster came from a thing I’d trusted to help me. The whole “AI as a partner” idea had a very specific, very ugly counterexample sitting on my screen.
I didn’t walk away.
What came next
The NAS survived. I’ll never know exactly why — the agent’s wipe just didn’t reach it. I disconnected it from the network anyway, before anything could reconsider.
No backup, no pity. I’d followed it. Technically.
Then I checked what was actually on the NAS. Docker volumes. Same data as the cloud backups. Both held what the containers had been chewing on, not the systems themselves. No VM definitions, no network config, no record of what had been running where.
Two backups, one set of data, same gap.
I stepped back. Took the rest of the day off. Next morning, I started over.
But I didn’t start over by writing more rules. I didn’t add more guardrails. I didn’t give up on AI agents.
I started over by changing what I gave the agent — and how I gave it.
That’s the next post.
Part of The 2026 Rebuild.
