Buildly Blog

Running Agents in Monorepos: Isolation, Context Windows, and Blast Radius

· Buildly Engineering
Visualization of isolated agent work within a large repository structure

When we started testing Buildly against monorepos, the first thing we noticed wasn't correctness — it was context explosion. A task to add a field to a user profile endpoint was pulling in 40,000 tokens of unrelated service code before any generation happened. The agent was technically aware of the entire codebase, but that awareness was mostly noise.

Monorepos are the dominant repository structure for growing engineering teams. A single repo containing the API, frontend, mobile clients, shared libraries, and internal tooling makes dependency management tractable and code sharing explicit. But for autonomous agents, that structure creates a specific problem: how do you scope the agent's context to the task at hand without losing the cross-service awareness that the monorepo was designed to enable?

The Naive Approach Fails Fast

The obvious answer is to feed the agent the whole repo. Modern context windows are large enough, and the agent can figure out what's relevant. In practice, this creates two problems that compound each other.

First, quality degrades as context grows. Not linearly — it's more like a cliff. Once a model is navigating more than roughly 60,000 tokens of code, the probability of hallucinated imports, wrong function signatures, and stale interface assumptions climbs noticeably. The model starts pattern-matching on surface similarities rather than reasoning about actual data flow.

Second, and more dangerous: an agent with full codebase context will sometimes make cross-service changes when a single-service change would have been sufficient. We saw this early on with a task involving a shared utility function. The agent correctly identified that the utility was used by three services, and then "helpfully" modified the calling convention across all three — when the ticket only asked to update one service's behavior. Technically coherent. Operationally catastrophic if merged without review.

How Buildly Builds Task Scope

Our approach to monorepo isolation has two components: dependency-aware context selection, and hard blast-radius limits.

When an agent picks up a ticket, the first step isn't code generation — it's scope construction. The Style Graph maps service boundaries, shared libraries, and import relationships. Given a ticket mentioning a specific endpoint or module, we can traverse the dependency graph in both directions: what does this module import (upward dependencies the agent needs to understand), and what imports this module (downstream consumers the agent must not break).

The result is a scoped context: typically 8,000–25,000 tokens depending on how deeply connected the affected module is. For an isolated internal service, that's often just the service's own code plus a handful of shared types. For a module that touches authentication or a core data model, the context expands — but only to the parts that are genuinely relevant to the change.

The blast-radius limit is a separate constraint. Regardless of what the dependency graph says, a Buildly agent will not modify files outside a pre-declared scope set without an explicit override. If the ticket says "update the payment service," the agent can propose changes to payment service files and shared types it imports. It cannot touch the notification service, even if it detects that a shared type change would technically require it. That cross-service impact gets flagged in the PR description instead, with a clear note that it requires human decision.

The Context Window as a Confidence Signal

One thing we've found useful: context size correlates with task risk. When our scoping algorithm constructs a context window larger than expected for a seemingly simple ticket, that's a signal worth surfacing. It usually means the affected module is more entangled than the ticket author realized.

We track expected versus actual context size as one of several confidence signals in our PR output. A ticket that says "rename this field in the user model" but results in a 35,000-token context means that field is referenced in more places than expected. The PR description will note this explicitly: "Scope expanded beyond initial estimate — field referenced by [N] modules. Review downstream consumers before merging."

This matters more in monorepos than in single-service repos because the blast radius of a poorly-scoped change is larger. A change that looks isolated at the service level can propagate through shared libraries to touch things the original engineer never considered.

Handling Shared Libraries: The Hard Part

The trickiest monorepo scenario is a change to a shared library that multiple services depend on. In a human engineering workflow, this typically requires a PR that touches the library plus coordinated PRs in each consuming service — or a careful deprecation path that allows services to migrate independently.

Agents don't naturally have a sense of release coordination. The model knows what's true now; it doesn't know what teams are mid-migration, which services are in a freeze, or which downstream dependency is being actively refactored by another engineer.

Our current approach is conservative: when a task requires a shared library modification, the agent opens the library PR and stops. It lists the downstream consumers in the PR description with a recommendation for how each should be updated, but it does not open those downstream PRs automatically. That coordination decision belongs to the engineer reviewing the library change.

We're not saying this is the permanent answer. Automated coordination across services in a monorepo is tractable — just requires more state than we currently maintain about concurrent work in flight. It's on the roadmap for a reason.

Parallel Tasks in the Same Monorepo

When multiple agents are working concurrently on different services in the same monorepo, we need conflict detection before any PR is opened. Two agents writing to the same shared types file simultaneously will produce conflicting changes that can't be auto-resolved.

Buildly tracks in-flight agent tasks at the file level. Before an agent begins writing code, it declares the files it expects to modify. If another task has already declared a conflicting file, the later task is queued rather than allowed to proceed in parallel. This is a conservative approach — there will be false conflicts, tasks queued unnecessarily because two changes happen to touch the same file but don't actually conflict logically.

We've found this acceptable in practice. The alternative — optimistically proceeding and resolving conflicts post-hoc — produces messier PR histories and higher cognitive load for engineers doing reviews. Engineers reviewing agent PRs benefit from clean, predictable changes. Queuing a non-conflicting task for a few extra minutes is a reasonable tradeoff.

What "Isolation" Actually Means in Practice

When we talk about agent isolation in monorepos, we mean three things simultaneously: context isolation (the agent only sees what it needs to see), write isolation (the agent can only modify files within its declared scope), and temporal isolation (the agent's task executes as if it's the only in-flight change, with conflicts surfaced rather than silently resolved).

These three properties together define what we think of as safe monorepo agent operation. None of them is sufficient on its own. Context isolation without write isolation means a well-informed agent can still make changes outside its scope. Write isolation without temporal isolation means two agents can each produce individually correct changes that conflict in practice. Context isolation plus write isolation without temporal isolation produces flaky behavior that depends on task execution order.

The engineering teams we've connected to large monorepos consistently report that the blast-radius limit is the feature they trust most. Not the context quality, not the code generation accuracy — the guarantee that an agent task won't silently touch something it was never supposed to touch. That constraint alone changes the risk calculation for autonomous code generation in a production codebase.

Back to Blog Request Demo