Buildly Blog

Backlog to Branch: The Anatomy of One Agent Task

2025-04-28 · Buildly Engineering

Pipeline showing a ticket transforming into a code branch

People ask us what "autonomous coding agent" means in practice. The phrase is easy to reach for, hard to make concrete. So here's something concrete: a walk-through of exactly what happens from the moment a Buildly agent picks up a ticket to the moment the PR opens. Every step. No abstractions.

The example ticket: PLAT-312 — "Add cursor-based pagination to GET /api/v2/invoices. Limit default 50, max 200. Return next_cursor in response envelope."

Step 1: Ticket ingestion and intent parsing

The agent receives the ticket via the backlog integration — in this case, a Jira webhook that fires when the ticket is moved to "In Progress." The raw ticket contains the title, description, any linked tickets, the reporter, the assignee (buildly-agent), and any attached comments.

The first pass is intent parsing: what type of task is this? Buildly classifies tasks along two axes — scope (single endpoint, single module, cross-cutting) and type (new feature, modification, migration, test coverage). This classification feeds the retrieval step. A "modify existing endpoint" task needs different context than a "new resource type" task.

PLAT-312 is classified as: scope = single endpoint, type = modification with new response contract. The parser also extracts the concrete parameters: the endpoint path, the pagination style (cursor-based, not offset), the default and max limits, and the new response field name.

Any ambiguity at this step gets flagged immediately. If the ticket said "add pagination" without specifying cursor vs. offset, the agent would open a draft PR or add a comment to the ticket asking for clarification before proceeding. Ambiguous requirements resolved silently produce wrong implementations. We'd rather surface the question early.

Step 2: Codebase location and Style Graph retrieval

With the intent parsed, the agent identifies which parts of the codebase it needs to understand. The endpoint path /api/v2/invoices is used to locate the relevant handler — in this codebase, that's src/api/v2/handlers/invoices.py and its associated route registration.

The agent then queries the Style Graph for patterns relevant to this task type in this module's context. Specifically:

How is pagination currently implemented on other v2 endpoints? (The Style Graph has two prior examples: /api/v2/transactions and /api/v2/ledger-entries)
What's the response envelope structure? (Style Graph: all v2 responses return a {"data": [...], "meta": {...}} wrapper)
How are query parameters validated? (Style Graph: Pydantic models in a params/ subdirectory, not inline validation)
What's the test pattern for handler tests? (Style Graph: pytest fixtures in tests/api/v2/conftest.py, one test file per handler)

The Style Graph retrieval returns a scoped subgraph — the relevant patterns without the full graph. This is the context the code generation step works from.

Step 3: Implementation planning

Before writing code, the agent produces an internal implementation plan. This isn't exposed to the user by default, but it drives the generation step. For PLAT-312, the plan looks roughly like this:

Create params/invoices_params.py with InvoiceListParams Pydantic model including cursor: Optional[str], limit: int = 50 with validator le=200
Modify the list_invoices handler to accept params: InvoiceListParams and pass cursor to the repository layer
Add find_invoices_after_cursor method to InvoiceRepository following the pattern of find_transactions_after_cursor
Update the response envelope to include meta.next_cursor — None when no more results
Add test coverage in tests/api/v2/test_invoices.py for: default limit, max limit enforcement, valid cursor, exhausted cursor returning null

If the implementation plan requires touching something outside the declared scope — for example, if the codebase had no cursor pagination yet and implementing it required adding a shared utility — the agent flags that in the plan and surfaces it in the PR description rather than making the call silently.

Step 4: Code generation

Code generation runs against the implementation plan and the Style Graph subgraph. The output for PLAT-312 spans five files: the new params file, the modified handler, the modified repository, the modified response serializer, and the test file.

The generation is sequential, not parallel. The params file comes first because the handler depends on it. The repository change comes next because the handler depends on it. The test file comes last because it depends on all of the above. This sequencing matters for coherence — generating tests before the implementation can produce tests that test the wrong thing.

Throughout generation, the agent applies the Style Graph patterns explicitly. The Pydantic model uses the same field naming and validator style as TransactionListParams. The repository method name follows the find_{resource}_after_cursor convention. The test fixtures use tests/api/v2/conftest.py fixtures rather than creating new ones inline.

Step 5: Self-review pass

After generation, the agent runs a self-review pass against the output before opening the PR. This isn't a full correctness check — it's a consistency check. Does the implementation plan match what was generated? Do all the new method signatures that the test file calls actually exist in the generated implementation? Are there any import paths that reference modules the agent didn't modify?

This pass catches class-of-errors where generation went slightly off-track — an import that was written from memory rather than from the actual module path, a test that calls a method with the wrong argument signature. It does not catch logical errors in the implementation; that's what the human reviewer is for.

Step 6: Branch creation and PR opening

The agent creates a branch named buildly/PLAT-312 (following the branch naming convention extracted from the Style Graph — in this codebase, agent branches use the buildly/ prefix), commits the changes with a message that follows the conventional commits format the team uses, and opens a pull request.

The PR description includes:

A summary of what was implemented
A link back to PLAT-312
Notes on any judgment calls made (in this case: "I followed the cursor implementation in /transactions — if you prefer the alternative approach used in the older /ledger-entries cursor, let me know")
Testing notes: what the new tests cover

The PR is opened ready for review (not draft), because the task was well-defined and the implementation plan was clean. CI runs automatically. The assignee gets notified via Slack.

What an engineer sees

From the engineer's perspective: the ticket moved to "In Progress" and roughly 15 minutes later (for a task this size), a PR appears in their GitHub notifications. The PR is 140 lines of diff across five files. The description explains what was done and why. CI is green. The review takes 20 minutes: they check the cursor logic, verify the limit validator is correct, read through the tests, and merge.

Total human time for a ticket that would have taken 3–4 hours to write: about 30 minutes including the review and merge. The agent time is wall-clock 15 minutes, running in the background while the engineer works on something else.

That's one task. The interesting question is what the sprint looks like when 8–12 tasks go through this pipeline in parallel, and the engineers' attention is freed for the work that actually requires their judgment.