People ask us what "autonomous coding agent" means in practice. The phrase is easy to reach for, hard to make concrete. So here's something concrete: a walk-through of exactly what happens from the moment a Buildly agent picks up a ticket to the moment the PR opens. Every step. No abstractions.
The example ticket: PLAT-312 — "Add cursor-based pagination to GET /api/v2/invoices. Limit default 50, max 200. Return next_cursor in response envelope."
Step 1: Ticket ingestion and intent parsing
The agent receives the ticket via the backlog integration — in this case, a Jira webhook that fires when the ticket is moved to "In Progress." The raw ticket contains the title, description, any linked tickets, the reporter, the assignee (buildly-agent), and any attached comments.
The first pass is intent parsing: what type of task is this? Buildly classifies tasks along two axes — scope (single endpoint, single module, cross-cutting) and type (new feature, modification, migration, test coverage). This classification feeds the retrieval step. A "modify existing endpoint" task needs different context than a "new resource type" task.
PLAT-312 is classified as: scope = single endpoint, type = modification with new response contract. The parser also extracts the concrete parameters: the endpoint path, the pagination style (cursor-based, not offset), the default and max limits, and the new response field name.
Any ambiguity at this step gets flagged immediately. If the ticket said "add pagination" without specifying cursor vs. offset, the agent would open a draft PR or add a comment to the ticket asking for clarification before proceeding. Ambiguous requirements resolved silently produce wrong implementations. We'd rather surface the question early.
Step 2: Codebase location and Style Graph retrieval
With the intent parsed, the agent identifies which parts of the codebase it needs to understand. The endpoint path /api/v2/invoices is used to locate the relevant handler — in this codebase, that's src/api/v2/handlers/invoices.py and its associated route registration.
The agent then queries the Style Graph for patterns relevant to this task type in this module's context. Specifically:
- How is pagination currently implemented on other v2 endpoints? (The Style Graph has two prior examples:
/api/v2/transactionsand/api/v2/ledger-entries) - What's the response envelope structure? (Style Graph: all v2 responses return a
{"data": [...], "meta": {...}}wrapper) - How are query parameters validated? (Style Graph: Pydantic models in a
params/subdirectory, not inline validation) - What's the test pattern for handler tests? (Style Graph: pytest fixtures in
tests/api/v2/conftest.py, one test file per handler)
The Style Graph retrieval returns a scoped subgraph — the relevant patterns without the full graph. This is the context the code generation step works from.
Step 3: Implementation planning
Before writing code, the agent produces an internal implementation plan. This isn't exposed to the user by default, but it drives the generation step. For PLAT-312, the plan looks roughly like this:
- Create
params/invoices_params.pywithInvoiceListParamsPydantic model includingcursor: Optional[str],limit: int = 50with validatorle=200 - Modify the
list_invoiceshandler to acceptparams: InvoiceListParamsand pass cursor to the repository layer - Add
find_invoices_after_cursormethod toInvoiceRepositoryfollowing the pattern offind_transactions_after_cursor - Update the response envelope to include
meta.next_cursor—Nonewhen no more results - Add test coverage in
tests/api/v2/test_invoices.pyfor: default limit, max limit enforcement, valid cursor, exhausted cursor returning null
If the implementation plan requires touching something outside the declared scope — for example, if the codebase had no cursor pagination yet and implementing it required adding a shared utility — the agent flags that in the plan and surfaces it in the PR description rather than making the call silently.
Step 4: Code generation
Code generation runs against the implementation plan and the Style Graph subgraph. The output for PLAT-312 spans five files: the new params file, the modified handler, the modified repository, the modified response serializer, and the test file.
The generation is sequential, not parallel. The params file comes first because the handler depends on it. The repository change comes next because the handler depends on it. The test file comes last because it depends on all of the above. This sequencing matters for coherence — generating tests before the implementation can produce tests that test the wrong thing.
Throughout generation, the agent applies the Style Graph patterns explicitly. The Pydantic model uses the same field naming and validator style as TransactionListParams. The repository method name follows the find_{resource}_after_cursor convention. The test fixtures use tests/api/v2/conftest.py fixtures rather than creating new ones inline.
Step 5: Self-review pass
After generation, the agent runs a self-review pass against the output before opening the PR. This isn't a full correctness check — it's a consistency check. Does the implementation plan match what was generated? Do all the new method signatures that the test file calls actually exist in the generated implementation? Are there any import paths that reference modules the agent didn't modify?
This pass catches class-of-errors where generation went slightly off-track — an import that was written from memory rather than from the actual module path, a test that calls a method with the wrong argument signature. It does not catch logical errors in the implementation; that's what the human reviewer is for.
Step 6: Branch creation and PR opening
The agent creates a branch named buildly/PLAT-312 (following the branch naming convention extracted from the Style Graph — in this codebase, agent branches use the buildly/ prefix), commits the changes with a message that follows the conventional commits format the team uses, and opens a pull request.
The PR description includes:
- A summary of what was implemented
- A link back to PLAT-312
- Notes on any judgment calls made (in this case: "I followed the cursor implementation in
/transactions— if you prefer the alternative approach used in the older/ledger-entriescursor, let me know") - Testing notes: what the new tests cover
The PR is opened ready for review (not draft), because the task was well-defined and the implementation plan was clean. CI runs automatically. The assignee gets notified via Slack.
What an engineer sees
From the engineer's perspective: the ticket moved to "In Progress" and roughly 15 minutes later (for a task this size), a PR appears in their GitHub notifications. The PR is 140 lines of diff across five files. The description explains what was done and why. CI is green. The review takes 20 minutes: they check the cursor logic, verify the limit validator is correct, read through the tests, and merge.
Total human time for a ticket that would have taken 3–4 hours to write: about 30 minutes including the review and merge. The agent time is wall-clock 15 minutes, running in the background while the engineer works on something else.
That's one task. The interesting question is what the sprint looks like when 8–12 tasks go through this pipeline in parallel, and the engineers' attention is freed for the work that actually requires their judgment.