Feedback Flywheel

Every AI interaction generates signal: prompts that worked, context that was missing, patterns that succeeded, failures worth preventing. Most teams discard this signal. I propose a structured feedback practice that harvests learnings from AI sessions and feeds them back into the team's shared artifacts, turning individual experience into collective improvement.

08 April 2026

Rahul Garg

Rahul is a Principal Engineer at Thoughtworks, based in Gurgaon, India. He is passionate about the craft of building maintainable software through DDD and Clean Architecture, and explores how AI can help teams achieve engineering excellence.

generative AI

This article is part of a series:

Patterns for Reducing Friction in AI-Assisted Development

The Compounding Problem
Four Types of Signal
The Practice
Measuring What Changes
Calibration
Conclusion

Sidebars

This pattern in Lattice

Teams have always had mechanisms for collective learning. Retrospectives, post-incident reviews, lunch-and-learns. The best of these share a property: they convert individual experience into shared practice. What one person encountered in a debugging session or a production incident becomes something the whole team knows. The knowledge escapes the individual and enters the team's infrastructure: its wikis, its runbooks, its code review checklists.

With AI coding assistants, most teams reach a plateau. They adopt the tools, develop some fluency, and then stay there. The same prompting habits, the same frustrations, the same results month after month. Not because the tools stop improving, but because the team's practices around the tools stop improving. There is no mechanism for compounding what works. Each developer accumulates individual intuition (useful phrasings, effective workflows, hard-won understanding of what the AI handles well and what it does not) but that intuition remains personal. It does not transfer.

The infrastructure I have described in earlier articles — Knowledge Priming, Design-First Collaboration, Context Anchoring, and Encoding Team Standards — is not a collection of static artifacts. They are surfaces that can absorb learning. The missing piece is the practice of feeding learnings back in: a feedback loop that turns each interaction into an opportunity to improve the next one.

The Compounding Problem

My impression is that teams adopting AI tools at roughly the same time can arrive at very different places six months later. The difference often lies less in talent or tooling than in whether they have a practice of capturing what worked.

Without a learning system, AI effectiveness flatlines. The team uses the tools. The tools are useful. But the way the team uses them does not evolve. The same gaps in the priming document cause the same corrections. The same ambiguous instructions produce the same mediocre outputs. The same failure patterns recur without anyone connecting the dots. What is missing is not effort — it is a mechanism for the effort to accumulate.

These artifacts create surfaces for learning. But surfaces alone are passive. A priming document does not update itself when the AI defaults to a deprecated API. A review command does not add a new check when a category of bug slips through. They need an active practice of feeding learnings back in.

Consider what a single session can look like when the loop is in place. A developer uses a generation instruction to implement a new service endpoint. A review instruction then runs on the output — and flags a missing authorization check, exactly the kind of oversight the generation instruction did not explicitly require. The developer fixes the issue and, before closing the session, adds one line to the team's learning log: Authorization checks on new endpoints not enforced by generation instruction. That file lives in the repository and is already part of the priming context for subsequent sessions. The next developer to implement an endpoint benefits from that observation without knowing the exchange happened; the authorization check is now part of what the AI verifies from the first pass. The generation instruction did not change. The priming context changed. The system learned. That is the flywheel: each rotation of the loop leaves the infrastructure a little better prepared for the next.

Commands evolve: when a review command misses something, it is a command waiting to be updated. The same principle applies to every artifact in the team's AI infrastructure: each should evolve based on what the team observes in practice. The question is how to make that evolution systematic rather than accidental.

The update itself can happen in different ways. Sometimes a developer edits the shared artifact directly, especially when the change requires judgment or careful wording. In other cases, an agent can draft or apply the update as part of the workflow, with a developer reviewing it before it becomes part of the team's shared context. I would not make one mechanism mandatory. What matters is that the learning is captured, validated, and fed back into the artifacts the team actually uses.

Four Types of Signal

AI interactions generate signal: information about what the team's artifacts capture well and what they miss. I find it useful to categorize this signal into four types, each mapping to a specific destination in the infrastructure.

Context signal. What the AI needed to know but did not: gaps in the priming document, missing conventions, outdated version numbers. Each correction a developer makes is a signal that the priming document is incomplete. When the AI keeps using the deprecated Prisma 4.x API, that is not a model failure; it is a priming gap. The version note is missing, so the AI defaults to its training data. Every “no, we do it this way” is a line that belongs in the priming document but is not there yet.

Instruction signal. Prompts and phrasings that produced notably good or bad results. When a particular way of framing a request consistently yields better output (a specific constraint that prevents the AI from jumping ahead, a decomposition that produces cleaner architecture) that phrasing belongs in a shared command, not in one developer's head. Instruction signal is the difference between personal fluency and team capability. As long as it stays personal, the team's effectiveness depends on who happens to be prompting.

Workflow signal. Sequences of interaction that succeeded: conversation structures, task decomposition approaches, workflows that reliably produced good outcomes. These are the team's emerging playbooks. A developer who discovers that designing API contracts before implementation consistently produces better results has found a workflow pattern. A developer who finds that asking the AI to critique its own output before proceeding catches issues earlier has found another. These workflow patterns, once identified, are transferable, but only if someone captures them.

Failure signal. Where the AI produced something wrong, and why. The root cause matters more than the symptom. A failure caused by missing context is a priming gap. A failure caused by poor instruction is a command gap. A failure caused by a model limitation is a boundary to document. With root-cause thinking, each failure points to a specific artifact that can be improved. Consider a developer asking the AI to generate a domain model. The output compiles — but on review, the domain objects are nearly anemic: data containers with all behavior pushed into service classes. It is neither a context failure nor a model limitation: the AI knew the project's bounded contexts and is capable of generating rich domain models. It is a command gap: the generation instruction never specified that behavior belongs in the domain objects, not in the classes around them. A single constraint added to the generation instruction is the fix.

The mapping is concrete. Context signal feeds back into priming documents. Instruction signal feeds back into shared commands. Workflow signal feeds back into team playbooks. Failure signal feeds back into guardrails and documented anti-patterns. The feedback loop has specific inputs and specific destinations; it is not an abstract aspiration to “get better at AI.” Not every observation clears the bar: one-off edge cases and personal style preferences stay personal. The signal worth capturing is one that recurred, or that any developer on the team would hit working on the same problem. It is a practice of updating particular artifacts based on those observations.

The Practice

The feedback loop works at four cadences, each matched to the weight of the update.

After each session: a brief reflection, not a formal process. One question: did anything happen in this session that should change a shared artifact? Often the answer is no. The session went fine, the priming document had what the AI needed, the commands caught what they should catch. When the answer is yes, the update is immediate: a line added to the priming document, a check added to a command, a note in a feature document. The discipline is in the question, not in the overhead. The act of asking takes seconds. The act of updating, when warranted, takes minutes. The easiest way to establish the habit is to anchor it to an existing checkpoint: a field in the PR template, a single line in the standup, or the act of closing the editor at end of day. The trigger matters less than the consistency.

At the stand-up: for teams that already have a daily stand-up, this is a natural place to spread useful learning quickly. A simple question such as “did anyone learn something with the AI yesterday that the rest of us should know?” can turn one person's discovery into shared practice without adding another meeting.

At the retrospective: an agenda item in the existing sprint retrospective: what worked with AI this sprint? What friction did we hit? What will we update? The outputs are concrete: a priming document revision, a command refinement, a new anti-pattern documented. This is where individual observations become team decisions. One developer's discovery that a particular constraint improves code review output becomes the team's updated review command. The tech lead or a designated owner makes the final call on what gets committed to shared artifacts; the retrospective is the forum for surfacing options, not for reaching consensus on every detail.

Periodically: a review of whether the artifacts are actually being used and whether they remain current. Which commands are being run? Which are ignored? Where are the remaining gaps? This is the lightest cadence, quarterly, or when the team senses that the artifacts have drifted from practice.

The practice is lightweight by design. The heaviest cadence is a five-minute agenda item in a meeting that already exists. If the practice requires its own meeting, it will be the first thing cut when the team is busy — which is precisely when learning matters most.

Knowing the practice is running is different from knowing it is working.

Measuring What Changes

Most teams that try to measure AI effectiveness measure the wrong things. Speed (lines generated, time to first output) measures volume, not value. A fast output that requires extensive rework is not a productivity gain. It is rework with extra steps.

What actually matters is harder to measure but more informative: first-pass acceptance rate (how often the AI's initial output is usable without major revision), iteration cycles (how many back-and-forth rounds a task requires), post-merge rework (how much fixing happens after code ships), and principle alignment (whether the output follows the team's architectural standards). These are the indicators that the feedback loop is working: the team's artifacts are capturing more of what the AI needs, and the AI's output is converging toward what the team expects.

For teams already tracking DORA metrics, these indicators can serve as useful leading signals. Fewer iteration cycles usually mean less rework per change, which in turn helps shorten lead time. Higher principle alignment means architectural drift is caught earlier, before it reaches production, which should reduce the change failure rate. The feedback loop is not a separate initiative so much as a way of improving the outcomes the team already cares about. If DORA metrics are not yet part of the practice, a simpler proxy will do: how often does the team say “the AI knew exactly what to do”? Tracked informally, that frequency gives an early indication that the artifacts are helping before the broader delivery metrics move.

The honest framing: these metrics are difficult to track rigorously. Counting iteration cycles requires a consistent definition of what constitutes a “cycle”; that definition varies by task complexity. First-pass acceptance is a judgment call, not a binary. In practice, the signal is often qualitative. The team notices that AI sessions are smoother, that commands catch more issues, that new team members ramp up faster with the priming documents and playbooks than they did without them. The absence of frustration — the declining frequency of “why did the AI do that?” — is often the most reliable indicator. I would not recommend building a dashboard. I would recommend paying attention.

Calibration

This practice matters most for teams that have already established the foundational infrastructure from the earlier articles and want to move from “we use AI” to “we get better at using AI.” For teams still in initial adoption, the priority is building that infrastructure first. The feedback loop that improves it comes after.

The trade-off is discipline without bureaucracy, a narrow path. Too formal, and the practice becomes overhead that gets abandoned within a quarter. Too informal, and it is indistinguishable from not doing it at all. The after-session question, the retrospective agenda item, the periodic review — these are deliberately minimal. The rhythm matters more than the rigor. A team that asks “what should we update?” every two weeks and acts on the answer will improve faster than a team that designs an elaborate harvesting process and abandons it when deadlines tighten.

There is an urgency to this that is structural. The AI ecosystem (models, tools, capabilities) evolves on a cadence that makes traditional documentation decay look glacial. A priming document written when the team adopted one model version may actively misguide when a newer version handles context windows differently. A command designed around one tool's strengths may miss capabilities introduced in the next release. This is the same dynamic teams already understand with dependency management: a lockfile that is never updated does not stay stable, it becomes a liability. These artifacts deserve the same treatment: reviewed periodically, maintained with the same discipline as test suites, not written once and filed alongside onboarding checklists. The teams that treat them as living infrastructure will compound. The teams that treat them as setup documentation will plateau, not because they started wrong, but because they stopped maintaining.

The feedback loop has nowhere to go without the artifacts it improves — start with those.

Conclusion

What distinguishes a team that merely uses AI from one that gets better with it is not the model. It is whether the team has a way to turn each interaction into a small improvement in its shared artifacts. That is the role of the feedback loop. It takes what would otherwise remain personal intuition - a prompt that worked, a failure that recurred, a missing convention, a review gap - and makes it part of the team's infrastructure.

This is why I see the feedback flywheel not as an extra practice layered on top of the others, but as the maintenance mechanism for all of them. Knowledge Priming drifts unless someone updates it. Design-First Collaboration improves only when teams notice what structure helped. Context Anchoring gets better when teams see what they failed to capture. Encoding Team Standards> sharpen when failures expose missing checks. The infrastructure compounds only if practice feeds back into it.

Taken together, these techniques describe a way of working with AI that mirrors how good teams work with each other: share context early, think before coding, make standards explicit, externalize decisions, and learn from each session. The tools will keep changing. The teams that continue learning through shared artifacts and lightweight rituals will be the ones that get more value from them over time.

I would not begin with all of this at once. I would begin with one shared artifact and one habit: at the end of a session, ask what should change for the next one. Then make that change while the lesson is still fresh. That is small enough to sustain, and small steps are what make the flywheel turn.

Significant Revisions

08 April 2026: first published