Abstract data visualisation in dark blue

I gave my AI agent root. It got safer.

After the wipe, the obvious instinct is to clamp down. Less permissions. More approval gates. Make the agent ask before doing anything that could matter.

That instinct is wrong. I learned it by accident.

The experiment

I ran two projects in parallel. Same period, same general workflow, similar scope.

Project A — gated. Permission gates everywhere. Every shell command needed me to approve it. Every file write went through a review step. I was the bottleneck on every action. The agent could propose; I had to allow. By the book.

Project B — yolo. Full root. The agent had whatever access it needed, no human gate per command. I read the diffs at the end of work sessions. Corrections happened after the fact, not before.

I expected Project A to be safer and Project B to be a trail of disasters.

That isn’t what happened.

Early on

Project A was fine and frustrating. Every task took twice as long because I was a checkpoint on every action. The agent felt sluggish. I felt sluggish — constantly context-switching from my own work to read and approve commands. The work got done. Slowly. Defensively. With me as a permanent overhead.

Project B was rocky. The agent broke things. Some things were noisy (a config in the wrong place); some were embarrassing (a wrong directory wiped); none were catastrophic. I watched, swore, fixed. I fed corrections back into the documentation. The agent picked up the corrections.

Score after a couple of weeks: A was tediously fine; B was a mess.

Over time

Project A stayed fine. Stayed slow. Same patterns of agent-asks-human-approves-agent-does. No improvement in the pattern. Why would there be? Permission gates don’t teach. They just block.

Project B got quiet. The agent wasn’t breaking things anymore. The corrections from the early weeks had moved into the documentation. The system described itself accurately enough that the agent wasn’t running into the same surprises twice. The work was getting done at speed I hadn’t seen in Project A.

By a month or so in, the gap was uncomfortable to look at. Project B was the productive one. Project A was the cautious one that never got out of first gear.

Why this happens (I think)

Permission gates are a moat between the agent and reality. The agent never gets to be wrong; the human is wrong on its behalf, by approving things. There’s no feedback loop from “did this work” back into “what does the system actually look like”.

Yolo is brutal early. The agent runs into the actual edges of the system. The mistakes are real. But each mistake is a data point that goes into the documentation, the prompt, the shape of the next task. The system gets more accurate over time because reality keeps correcting it.

Friction-gated AI is worse than full-trust AI plus correction loops. Not because gates are bad in principle. Because gates substitute the human’s judgement for the system’s reality, and the system’s reality is what makes the agent reliable.

Yolo, but in a cage

A clarification that matters: “full root” doesn’t mean root on my workstation. It means root on the agent’s machine.

There’s an AI I work with — typing in my terminal, in my editor, on my laptop. That one doesn’t have the keys to my house, because it doesn’t need them. We’re collaborating in real time and I’m reading every change.

There’s another kind of AI I let run on its own — chat assistants I brain-dump to, agents that pick up tickets in the background, schedulers that fire while I sleep. Each gets its own machine. Their own home. A firewall guards their home from the outside, and guards the outside from them. Daily backups on the host. Anything they touch in code ends up in git, where mistakes are reversible.

That’s the setup that makes yolo sustainable. It’s not give the agent root and pray. It’s build the cage carefully, then let the agent be free inside it. If something goes badly wrong on an agent host, I roll the machine back one day. Annoying. Not fatal.

The shape of safety, generalised: containment + reversibility, not approval + supervision. Isolate the blast radius. Make recovery boring. Then stop trying to control every action.

What this means in practice

I don’t run permission gates anymore. I run reality gates.

  • The agent has the access it needs.
  • The system documentation describes what’s true accurately enough that what the agent thinks is true matches what is true.
  • Mistakes are caught by the work itself failing, not by me reading every command.
  • Corrections feed back into the documentation, not into a rules-list that erodes.

This works because of what I wrote about in the previous post — the agent’s reliability comes from the information shape, not from how many guards I bolt on. With well-shaped information, full permissions are safe. With shallow information, no amount of permission gating will save you. The wipe disaster proved that direction; the yolo experiment proved the other.

The uncomfortable conclusion

If I’d run the yolo experiment before the wipe disaster, I’d have called the result luck. After the disaster, I knew exactly why it worked. The thing the gated project was missing wasn’t approval steps. It was the part of the loop where the agent collides with reality and the system updates itself.

You don’t get a reliable AI partner by holding its hand. You get one by giving it room to be wrong cheaply, then making sure being wrong updates the world the agent reads.

What I had now was an agent I could trust with the work — given the right context and the right freedom.

Next: scaling that into actual infrastructure I could rebuild.


Part of The 2026 Rebuild.