FJAN Logo

Guard AI Development at the Work Item, Not the PR

Publicatiedatum

Leestijd

11 minuten

Delen

Curving abstract shapes with an orange and blue gradient
Profile photo of Funs Janssen

Geschreven door Funs Janssen

Software Consultant

Ik ben Funs Janssen. Ik ontwikkel software en schrijf over de beslissingen die daarbij komen kijken: architectuur, ontwikkelmethoden, AI-tools en de zakelijke impact van technische keuzes. Deze blog is een verzameling praktische aantekeningen van echte projecten: wat schaalbaar is, wat misgaat en wat in blogvriendelijke voorbeelden vaak over het hoofd wordt gezien.

Your team's velocity is up and your confidence is down. AI coding agents now pick up an Azure Boards work item, read it, and produce a pull request with minimal human involvement. That is a real productivity gain, and it is also why something feels off: code is arriving faster than anyone can meaningfully review it, and a surprising amount of it solves the wrong problem cleanly.

Most advice about guarding AI development points you at the pull request: AI code review bots, static analysis, dependency scanning. Those tools matter. But they all share one blind spot. They run after the agent has already built something. By then the wrong thing exists, and your guardrail is really just a cleanup crew.

This post argues the first guardrail belongs upstream, on the work item itself, before any code is generated. I will show why agentic workflows moved the point of failure into Azure Boards, why advisory checklists fail under that pressure, and how to build blocking Definition of Ready and Definition of Done gates with the Checklist extension so that "Done" still means done.

Why AI development moved the point of failure upstream

For a decade, the implicit safety net in software delivery was the developer's judgment. You could hand someone a vague user story titled "improve search" with no acceptance criteria, and a competent developer would stop, frown, and go ask what you actually meant. That pause was a guardrail. It just was not written down anywhere.

An AI agent does not pause. Hand the same vague story to an autonomous coding agent and it does not come back with questions. It interprets, fills the gaps with assumptions, and confidently produces a pull request. Microsoft's own labs show this workflow in action: an agent reads the Azure Boards work item, expands it into acceptance criteria, and starts writing code, as documented in their AI-assisted work item management guidance. The quality of what it builds is now bounded by the quality of what you wrote in the work item. Garbage in is no longer caught by a human in the middle, because there is no human in the middle.

This is not a hypothetical risk. CodeRabbit's research found that AI-assisted code generation produces 1.7x more logic and correctness bugs than traditional development. Many of those are not the model being dumb. They are the model faithfully implementing an under-specified request.

Code-layer guardrails fire too late. SAST, SCA, and AI review bots inspect code that already exists. They are excellent at catching a hardcoded secret or a vulnerable dependency. They are useless at catching "this entire feature was built against the wrong assumption because the acceptance criteria were missing." That failure is invisible at the code layer, because the code is often perfectly clean. It just solves a problem nobody asked for. The only place to catch it is before the agent starts, at the work item.

The work item is your real control plane

If the agent acts on the work item, then the work item is where governance has to live. Two gates do most of the work.

  • Definition of Ready gates what an agent is allowed to pick up. Before a story is eligible for an agent (or a developer) to start, it should clear a basic bar: a clear outcome-focused title, a description that states intent rather than a guessed solution, testable acceptance criteria, and required fields like area, iteration, and priority set. This is the bar that used to live in a developer's head. With agents in the loop, it has to be explicit and checkable.
  • Definition of Done gates what is allowed to leave. Before a work item moves to Done, it should prove that the readiness thinking actually happened: acceptance criteria met, tests linked, a PR linked back to the item, review completed. I have written before about a three-part quality gate of clarity, completeness, and traceability applied at intake and at close, and agentic development makes that gate non-optional rather than nice-to-have.

Azure Boards already gives you primitives for this. You can define custom states, required fields, and rules, and Azure DevOps supports rules that restrict state transitions based on conditions. The question is how to make a Definition of Ready and a Definition of Done concrete, repeatable, and actually enforced, rather than a paragraph in a wiki nobody opens.

Advisory checklists do not survive contact with velocity

Most teams already have a Definition of Done. It lives on a Confluence page, or a pinned wiki article, or in someone's memory from the last retro. Under normal human pace, that mostly works, because the human pace itself is a throttle.

Agentic throughput removes the throttle. When work items are created and acted on faster than anyone reviews the process around them, a checklist that lives anywhere other than the work item gets skipped. Documentation is great for onboarding and deep explanation. It is terrible at getting used in the moment work actually happens. During a busy sprint, people live inside the work item and the board. They do not tab over to a wiki to confirm they met a ten-point bar.

So the enforcement has to be where the work is. The checklist needs to sit on the work item, be applied automatically so nothing slips through ungoverned, and ideally block the state transition so an incomplete item physically cannot move to Done. Advisory becomes enforced. That is the difference between a guardrail and a suggestion.

Building blocking guardrails with the Checklist extension

This is the gap my Checklist extension for Azure DevOps is built to close. It adds reusable checklists directly to work items, and crucially, it can block state transitions until the checklist is complete. Here is how to wire it into an AI-development workflow.

Reusable templates for DoR and DoD. Instead of retyping the same readiness items every sprint, define them once as templates. A Definition of Ready checklist on user stories might be: outcome-focused title, intent stated in the description, at least three testable acceptance criteria, area and iteration set. A Definition of Done checklist might be: acceptance criteria verified, tests linked, PR linked to this item, human review of AI-generated changes completed. Templates keep the same readiness signals appearing on every item, which is exactly the predictability you lose when each team improvises.

Auto-apply so nothing is created ungoverned. A guardrail you have to remember to add is a guardrail you will forget. Configure the checklist to attach automatically to new work items of the relevant type. Now every story an agent might pick up arrives with its readiness bar already stamped on it, with no human action required to make the governance appear.

Restrict state transitions until the checklist is complete. This is the part that makes it real. Wire checklist completion to a blocking rule so a work item cannot move into its in-progress state until the Definition of Ready is satisfied, and cannot move to Done until the Definition of Done is. An agent, or a developer, simply cannot advance an item that has not cleared the bar. The gate is not a reminder. It is a wall.

A worked example. Picture a story titled "Search shows out-of-stock items." Without a gate, an agent picks it up immediately and guesses at the fix. With the gate, the story sits in New with an incomplete Definition of Ready checklist: acceptance criteria are blank, so it cannot move to Active, so no agent can start. A human (or an AI work-item assistant) fills in the criteria, the checklist clears, and only now is the item eligible. The agent that eventually builds it is working from a specified request, not a hunch. When the PR comes back, the Definition of Done checklist blocks the move to Done until the PR is linked and a human has reviewed the generated diff. The wrong thing never gets built, and the unreviewed thing never gets closed.

Keep the checklist honest, or it becomes theater

A blocking gate is powerful, which is exactly why it can backfire. The failure mode is obvious once you have seen it: a team adds forty checklist items hoping that compliance equals quality, and within two sprints everyone is checking boxes reflexively to make the wall go away. Now you have all the friction of a gate and none of the protection.

  • Short, specific, enforceable. Every item should be something a person can objectively confirm in seconds and would genuinely block release if false. "Acceptance criteria are testable" is a real gate. "Code is high quality" is a vibe. If an item cannot be checked honestly without judgment calls, it does not belong on a blocking checklist.
  • Gate the right thing at the right stage. Intake (Definition of Ready) is about specification: is this item clear and complete enough for anyone, human or agent, to act on without guessing? Close (Definition of Done) is about evidence: tests linked, PR linked, review done. Do not push Done-stage proof into the Ready gate, and do not let specification slip to the end where the agent has already built on top of the ambiguity.

The goal is a thin, sharp gate at each end of the work item's life. Enough to stop an under-specified story from reaching an agent, and an unreviewed change from reaching Done, without burying the team in ceremony that they will route around.

Put the guardrail where the decision is made

AI agents did not break your delivery process. They removed the human pause that was quietly holding it together. The developer who used to stop and ask "what do you actually mean?" is no longer in the loop, so the judgment that pause represented has to be made explicit and moved upstream, onto the work item itself.

Code-layer guardrails still belong in your pipeline. But they are the second line of defense, not the first. The first guardrail is a governed work item: a Definition of Ready that decides what an agent is allowed to pick up, and a Definition of Done that decides what is allowed to ship, both enforced as blocking gates rather than advisory wiki pages. Put the guardrail where the decision is made, and "Done" keeps meaning done even when you did not type the code.

If you want to put that first guardrail in place, install the Checklist extension from the Visual Studio Marketplace. It is free, it adds reusable Definition of Ready and Definition of Done checklists straight onto your work items, and it can block state transitions until the work is actually done.

Frequently asked questions

Can a checklist really stop an AI agent from starting work on a story?

Not the agent directly, but the work item state it depends on. If you restrict the transition into your active state until a Definition of Ready checklist is complete, the story stays in New and is not eligible to be started by anyone, human or agent. The agent has nothing to pick up until the readiness bar is cleared.

Isn't this just slowing down the velocity that AI gave us?

A thin gate costs seconds at intake and close. The thing it prevents, an agent building an entire feature against a wrong assumption, costs days of rework and review. The net effect on real throughput is positive, because you are removing the most expensive failures, not adding ceremony for its own sake.

Doesn't Azure DevOps already enforce Definition of Done with built-in rules?

Azure Boards gives you the primitives: custom states, required fields, and rules that can restrict state transitions. What it does not give you out of the box is a reusable, auto-applied checklist sitting on the work item that those rules can key off. The Checklist extension fills that gap so your DoD lives on the item instead of in a wiki.

Where does this leave my AI code review and scanning tools?

Exactly where they are, in the pipeline. Upstream work-item gates and downstream code-layer scanning are complementary. The work item gate stops the wrong thing from being built; the scanner catches problems in what does get built. You want both, applied at different points.

How many items should a blocking Definition of Done have?

Fewer than you think. Aim for a handful of objectively checkable, genuinely blocking items: acceptance criteria met, tests linked, PR linked, AI-generated changes reviewed. Long checklists train people to check boxes reflexively, which defeats the gate entirely.

Reacties

Nog geen reacties. Wees de eerste om te reageren.

Plaats een reactie

Profile photo of Funs Janssen

Geschreven door Funs Janssen

Software Consultant

Ik ben Funs Janssen. Ik ontwikkel software en schrijf over de beslissingen die daarbij komen kijken: architectuur, ontwikkelmethoden, AI-tools en de zakelijke impact van technische keuzes. Deze blog is een verzameling praktische aantekeningen van echte projecten: wat schaalbaar is, wat misgaat en wat in blogvriendelijke voorbeelden vaak over het hoofd wordt gezien.

Inhoud