Buildly Blog

The Style Graph: How Buildly Reads Your Codebase Before Writing a Line

2025-03-24 · Camille Fontaine

Visual metaphor for a codebase knowledge graph being built

Every codebase has opinions. Not written anywhere, not declared in a style guide — just accumulated through hundreds of decisions made by the team that built it. Which layer handles validation. Whether you use interfaces or abstract classes. Whether errors get wrapped in domain types or bubble up raw. How tests are organized relative to source files. The naming convention for internal events.

A new engineer joining a team spends the first few weeks learning these opinions. Not because anyone teaches them, but because they read existing code, get feedback on PRs, and gradually internalize the pattern. A senior engineer navigating an unfamiliar codebase does the same thing, faster. This process — codebase reading, pattern internalization — is what makes it possible to write new code that fits.

An agent that skips this process produces code that looks like it was written by someone who read the documentation but never worked in the codebase. The output is syntactically correct. It's structurally wrong.

Why generic context isn't enough

The first version of Buildly's context layer was naive. We gave the agent the README, the directory structure, and the relevant files closest to the target module. It was better than nothing. It was not good enough.

The problem was that the context we were providing was declarative — here's what exists — not behavioral — here's how this team writes code. A README tells you what the service does. It doesn't tell you that this team consistently uses the repository pattern with a specific naming convention for query methods, or that they always wrap external API calls in a retry decorator rather than handling retries inline, or that test fixtures go in a conftest.py at the module level rather than the project root.

Those behavioral patterns are what determine whether generated code passes review in 20 minutes or gets sent back with "this doesn't match how we write this." We needed a way to extract and represent them systematically.

What the Style Graph captures

The Style Graph is a semantic representation of how a specific codebase is written. Not what it does — how it's done. We build it by traversing the repository and extracting several categories of pattern:

Structural patterns

Module organization, import conventions, layering. If a codebase has a clear service / repository / model layering, that's captured. If handlers always live in a specific directory relative to the domain they belong to, that's captured. These patterns tell the agent where to put code and what to import.

Naming conventions

Naming is more specific than most tools acknowledge. It's not just "snake_case" or "camelCase" — it's whether query methods are prefixed find_ or get_, whether boolean fields are prefixed is_ or has_, whether internal event types use a verb-noun convention like PaymentProcessed or a noun-verb like payment.processed. These micro-conventions are invisible until violated, at which point every reviewer notices.

Abstraction preferences

How does the team handle cross-cutting concerns? Is pagination a mixin, a helper, a middleware, or repeated inline? Is error handling centralized or distributed? Does the team use dependency injection or module-level singletons? These preferences aren't right or wrong in isolation — they're just the choices this team made, and new code that violates them creates inconsistency that compounds over time.

Test patterns

Test structure is as opinionated as production code, and generated tests that don't match the existing test style are rejected immediately. The Style Graph captures which fixture approach is used, how test classes (or functions) are organized, what assertion library and style is preferred, and how mocking is handled.

Building the graph: what it takes

When a new team connects a repository to Buildly, the Style Graph build takes roughly 24–48 hours for a mid-size codebase (40k–200k lines). Larger repos scale roughly linearly. This isn't a one-time scan — it's ongoing. As the team merges PRs, the graph updates, so the agent's understanding of the codebase evolves with the codebase itself.

The build process combines static analysis with pattern extraction. We parse the AST of each file, extract structural and naming signals, and build a graph where nodes are code entities and edges encode relationships — both explicit (imports, inheritance) and inferred (patterns that consistently co-occur). We then run a pattern-distillation pass that identifies which patterns are strongly consistent across the codebase vs. which are one-off exceptions.

That last part matters. Most codebases have historical debt — modules written by engineers who have since left, or before the team settled on a pattern. We don't want the agent to learn from those. The distillation pass weights patterns by recency and consistency, so the graph reflects current practice, not historical practice.

How the agent uses it during task execution

When an agent picks up a ticket, the first step is identifying which parts of the Style Graph are relevant to the task. A ticket to add pagination to an API endpoint needs the Style Graph's pagination patterns, API handler patterns, and test patterns for the relevant module — not the entire graph.

We built a relevance-retrieval layer that does this scoping automatically. Given a ticket description and the target files identified via backlog integration, it retrieves the most relevant Style Graph subgraph — the patterns the agent needs to write code that fits, without overwhelming the context window with patterns that aren't applicable.

This scoping step is where most of the quality difference comes from. Agents with the full graph but poor retrieval write code that's 70% right and 30% confused. Agents with a well-scoped subgraph write code that reads like it was written by someone familiar with that specific module.

What the Style Graph doesn't capture

We're deliberate about what the Style Graph is not. It's not a specification of what your code should do. It's not a lint rule set. It's not a way to enforce architectural standards you haven't already implemented. It learns from what exists — which means it learns your patterns, including your inconsistencies.

If your codebase has a module that was written differently from everything else — maybe it was scaffolded from a template, or migrated from another service — the Style Graph will represent that inconsistency, not resolve it. The agent will write code that fits the surrounding context as best it can, but we're not in the business of deciding which inconsistency is the "right" one to prefer. That's an architectural decision that belongs to your team.

We're also not claiming the Style Graph captures everything a senior engineer knows about a codebase. Domain knowledge — what the business logic actually means, what invariants can't be violated, what the product is supposed to do — is not in the graph. The agent needs a human-readable ticket that contains that context. The Style Graph handles the "how" dimension of the code. The ticket handles the "what" dimension. Both are required.

Early observations

Across the teams we've worked with, the most consistent feedback is that the first PR the agent opens after Style Graph build feels noticeably different from generic code generation output. Not perfect — there are still review comments, still requests for changes, still the occasional structural decision that a senior engineer would have made differently. But the feedback changes from "this is wrong" to "this is mostly right, adjust X." That distinction matters for whether engineers trust and review agent output or reject it on sight.

The Style Graph is the piece of Buildly we've spent the most time on, and it will be the piece we continue to invest in most heavily. Everything else in the pipeline — ticket parsing, code generation, PR formatting — is important. But if the output doesn't fit your codebase, none of the rest matters.