Buildly Blog

Rollback and Escape Hatches: How We Keep Agents Safe in Production Codebases

2025-09-18 · Buildly Engineering

Safety boundary visualization for autonomous agent actions

Every engineering team that evaluates autonomous coding agents asks the same question within the first 20 minutes: what happens when it does something wrong? It's the right question, and it deserves a more complete answer than "the agent is pretty accurate." Because the honest answer isn't about accuracy — it's about what the safety model looks like when accuracy isn't enough.

Here's ours.

The Foundational Constraint: PR-Only Writes

The most important safety property in Buildly's design is the simplest: a Buildly agent never commits directly to a branch. Every action that produces code produces a pull request. No exceptions, no configuration knobs that change this behavior, no fast-path for "low-risk" changes.

This isn't primarily about rollback. It's about maintaining the human-in-the-loop requirement as a structural property rather than a policy. If the agent can commit directly under the right circumstances, you've created a category of changes that bypass review. That category will expand over time as teams get comfortable, and eventually you'll have a production incident caused by agent code that no human saw before it shipped.

PR-only writes mean that the worst-case scenario is a PR that contains bad code. Not merged code. Not deployed code. A PR that a human can close, annotate, and use as feedback. The rollback, in most cases, is "don't merge the PR." That's an extremely fast and cheap operation.

Scope Boundaries and What Happens When They're Hit

Beyond the PR-only constraint, Buildly agents operate within declared scope boundaries. When a task is created from a ticket, the scope is initialized from the ticket's referenced modules, services, or file paths. The agent can read anything it needs for context, but it can only write to files within that scope.

When an agent's analysis determines that the correct implementation requires modifying something outside its declared scope, it has two options: proceed with an incomplete implementation (scoping the change to what it's allowed to touch) or stop and escalate. Which path it takes depends on whether the out-of-scope modification is necessary for correctness or just optimal.

If the agent can implement the ticket correctly within its scope, it does so and notes the out-of-scope impact in the PR description. "This change modifies the user service. The notification service also calls the affected method — recommend reviewing [file path] before merging." The engineer reviewing the PR makes the call.

If correctness genuinely requires the out-of-scope change — a shared type definition needs updating, or a shared library method needs a new parameter — the agent stops generating code and creates a task clarification comment instead. It describes what it found and why it can't proceed within scope, and waits for the ticket to be updated or scope to be expanded. This is a more conservative default than most teams initially expect, but we've found it's the right one. Agents that push through scope limits "because the change is obviously correct" are agents that occasionally make cascading changes no one expected.

Confidence Thresholds and the Graceful Stop

Every code segment the agent generates has an associated confidence score, derived from how well the target pattern is represented in the Style Graph and how much ambiguity exists in the ticket's requirements. When confidence drops below a configurable threshold, the agent doesn't silently generate low-quality code — it stops and surfaces the uncertainty.

The stop behavior looks different depending on where in the generation process it occurs. If the agent hits a low-confidence zone early — during context loading or initial planning — it produces a clarifying question and waits rather than generating a partial PR. If it hits low confidence after partial generation, it opens a draft PR with the high-confidence sections and adds inline comments flagging the uncertain sections: "Implementation uncertain here — the existing pattern suggests X approach, but the ticket implies Y behavior. Recommend human decision before implementing this section."

Draft PRs with uncertainty flags are a first-class output type in Buildly. We've found that teams respond well to them. An agent that honestly says "I got this far and here's what I'm unsure about" is more useful than one that generates complete but wrong code. Engineers can often resolve the uncertain section in 5 minutes with context the agent doesn't have. What they can't easily do is spot the subtle incorrectness in code that looks complete and confident.

Conflict Detection Before Opening

Buildly checks for merge conflicts and logical conflicts before opening a PR. This is distinct from the PR-only write constraint — it's about the quality of what gets opened.

Merge conflicts are straightforward: if the base branch has changed since the agent started working, we check for conflicts before opening and abort if they exist. The task goes back into queue rather than producing a conflicted PR that needs immediate attention.

Logical conflicts are harder. If another agent task is in-flight that modifies an overlapping set of files, we queue the later task rather than opening two PRs that will conflict when merged in sequence. This requires tracking in-flight file-level locks across concurrent agent tasks. It's more infrastructure than it sounds, and it's the reason we don't recommend running unlimited parallel agent tasks on the same repository — the coordination overhead has limits.

After a Bad PR: Feedback Loops

When a human reviewer closes a Buildly PR with a comment explaining why the code was wrong, that feedback doesn't disappear. We parse reviewer comments (with permission, always opt-in) and use them to update the Style Graph's pattern weights for the relevant code areas.

This isn't a learning-in-production-model scenario — we're not fine-tuning any model weights. We're updating the heuristic constraints that inform how the agent selects and applies patterns from the Style Graph. A reviewer comment that says "we don't use this error handling pattern here, see [other file] for how we handle this type of error" adds a new edge to the Style Graph connecting the affected module to the correct pattern exemplar.

Over time, the feedback loop means the agent gets fewer corrections in areas where it's received prior feedback. Teams that review consistently and comment specifically see measurable improvement in agent output quality within a few sprint cycles. Teams that close bad PRs silently get slower improvement — the agent has no signal about what was wrong.

What We're Not Claiming

None of this makes Buildly infallible. The safety model described above is designed to ensure that when the agent generates bad code, the damage is contained to a PR that a human can review and close. It does not guarantee that bad PRs won't be approved by engineers who review too quickly or who don't understand the subtle incorrectness.

We're not saying that PR-only writes eliminate all risk. We're saying they reduce the blast radius of any individual failure to something recoverable, and they maintain the human decision point that should exist between autonomous code generation and production deployment. The safety model keeps the engineer in the loop structurally, not just by policy. That structural guarantee is what makes autonomous code generation viable in production codebases, as opposed to a demos-only technology.