The Engineering Leader's Playbook for AI-Assisted Development

Tools & Workflows 18 min read by Girish Koliki
The Engineering Leader's Playbook for AI-Assisted Development

AI-assisted development drove a 59% increase in engineering throughput last year. Most teams captured almost none of it. The difference comes down to three layers, and most organisations have only addressed the first one.

CircleCI's 2026 State of Software Delivery report analysed over 28 million CI/CD workflows and found something that should make every engineering leader pause. Average throughput jumped 59% year over year. The largest increase in the report's seven-year history.[1]

But that number is misleading on its own.

The question for engineering leaders is not whether to adopt AI-assisted development. That ship sailed. The question is why some teams turn these tools into a real edge while most end up with more code, more noise, and roughly the same output.

The answer, from what the data shows and from what we see working with engineering teams, comes down to three layers. Most organisations have only addressed the first.

97% Throughput increase for the top 5% of engineering teams
4% Throughput increase for the median team
0% Measurable gain for the bottom quartile

§ The 59% Illusion: More Code, Less Software

Here is the number that tells the real story. Feature branch throughput increased 15.2% across the median team. That is where AI accelerates experimentation and iteration most visibly. But throughput on the main branch declined 6.8%.[1]

More code is entering the pipeline. Less is making it to customers.

The constraint has shifted. Writing code is no longer the bottleneck. The hard part is everything that happens after: review, validation, integration, and delivery. Teams that recognise this shift and restructure around it are the ones capturing the gains. Everyone else is creating a bigger pile of work that never ships.

You have doubled the output of your stamping press. But the quality inspectors, the assembly line, and the shipping dock are still running at the same pace they were before AI arrived. You do not ship more products. You ship more defects, or you grind the line to a halt.[2]

§ Layer 1: Individual Tooling (More Options Than You Think)

Cursor Tool

This is the layer most teams have addressed, and it is the least interesting one. Give your engineers good AI tools. Let them use what works. But "what works" has expanded significantly in the last twelve months, and most engineering leaders have not kept up with the full landscape.

Here is what is actually available right now, broken into categories that matter for decision-making.

AI-native IDEs (paid)

These are full editor environments built around AI from the ground up.

  • Cursor ($20/month Pro). The most broadly adopted AI IDE among individual developers. Fast autocomplete, inline chat, and agent mode for multi-file changes. Works well for small-to-medium tasks. Draws criticism on longer refactors and recent pricing changes, but it is the benchmark everything else gets compared to.[3]
  • Windsurf ($15/month Pro). A VS Code fork, now owned by Cognition after an acquisition in mid-2025. Polished UI, plan mode for structured agent workflows. Cheaper than Cursor, but the long-term roadmap is uncertain given the ownership changes.[3]
  • Antigravity (Google). Google hired the original Windsurf team and launched its own agentic IDE. Still early, but backed by serious resources and Google's model ecosystem.[4]
  • Zed. A fast, lightweight editor with built-in agentic workflows. Gaining traction with developers who care about speed and want something leaner than Cursor.[4]

IDE extensions and copilots (paid and free tiers)

These plug into your existing editor rather than replacing it.

  • GitHub Copilot ($10/month Individual, $19/month Business). The most widely deployed AI coding assistant in the world. Lives inside VS Code, JetBrains, and Visual Studio. Autocomplete is solid. Agent mode, launched in 2025, handles multi-step tasks and runs terminal commands autonomously. The pragmatic default for most enterprise teams.[5]
  • VS Code + OpenAI Codex. OpenAI's agent-first coding tool, increasingly discussed alongside Claude Code as a standalone agent you aim at real repositories. More deterministic on multi-step tasks than most competitors.[3]
  • JetBrains Junie. The natural choice for IntelliJ users. Promising direction with true agent-inside-IDE behaviour, but feedback is mixed on speed and flexibility compared to newer tools.[4]
  • Amazon Q Developer (free tier available). AWS's answer to Copilot. Strongest for teams deep in the AWS ecosystem, with good security scanning and infrastructure-aware suggestions.
  • Tabnine (free tier available, Enterprise from $39/month). Privacy-first. Can run entirely on-premises with no code leaving your network. The go-to for teams with strict IP or compliance constraints.

Open source and CLI-based agents (free or bring-your-own-model)

These give you the most control. You bring your own model, choose your provider, and manage your own costs.

  • Claude Code (Anthropic, usage-based). A CLI-based agent that developers consistently describe as the strongest "coding brain" available. The tool people reach for when the problem is genuinely hard: debugging subtle issues, reasoning about unfamiliar codebases, making architectural decisions. Pragmatic Engineer's 2026 survey found it has risen so fast it is nearly as widespread as GitHub Copilot was three years ago.[4]
  • Cline (open source, free). An autonomous coding agent that runs as a VS Code extension. Lets you choose any model, split tasks across planning and coding roles, and control costs directly. Rewards developers who invest time in configuration.[3]
  • Aider (open source, free). A CLI-first agent built for git-native workflows. Diffs, commits, branches. Works well with multiple models. Recommended for structured refactors where correctness matters more than convenience.[3]
  • RooCode (open source, free). Gaining a reputation as the reliable option for large, multi-file changes. Fewer half-finished edits and less agent thrashing than competitors on complex tasks.[3]
  • OpenCode (open source, free). The most popular fully open source coding agent, where you can swap out the model completely and avoid vendor lock-in.[4]
  • Gemini CLI (Google, free tier). Google's command-line agent. Fast and simple for iterative debugging and smaller tasks. Less reliable on complex refactors than Claude-backed agents.[4]

How to think about this as an engineering leader

Do not mandate a single tool across the team. Engineers have strong preferences and they are usually right about what makes them productive. Set a budget, make the approved options clear, and get out of the way.

The real question is not Cursor versus Copilot versus Claude Code. That conversation is valid for about a week. The question that matters for the next two years is what you do with the extra code these tools produce. That is Layer 2.

~4 hrs/week Average time saved per developer with AI coding tools
~10% Individual productivity improvement, plateaued since mid-2025

§ Layer 2: Team Workflows (Where Most Teams Get Stuck)

n8n workflows

This is the layer where the gains are made or lost, and it is the one most organisations have barely touched.

AI generates code fast. But code still has to move through people before it goes anywhere. It has to be reviewed, discussed, tested, and approved. And those human processes have not adapted to the new speed of code generation.

Code review is now the single biggest bottleneck in most engineering pipelines. Bottom-quartile teams take over 35 hours to merge a pull request.[6] That is not a tooling problem. It is a workflow problem. When AI can generate a feature branch in an afternoon that would have taken three days to write by hand, but the review queue still takes a week, you have not accelerated anything. You have just moved the queue.

Rethink PR size and review cadence. AI-generated code tends to come in larger chunks than hand-written code. Your review process was probably designed for smaller, incremental changes. If your team is struggling to review AI-assisted PRs, the answer is not "review faster." It is to restructure how work is broken up and how reviews are scheduled. Daily review slots, automated first-pass checks, and clear ownership of review responsibilities all help. Tools like Graphite enforce this pattern structurally by encouraging stacked, small PRs that are easier to review. Linear connects issues to PRs natively, so review context is always one click away.

Automate what you can in the review process. Linting, formatting, basic test coverage checks, security scanning. These should never require a human reviewer's time. Every minute a senior engineer spends catching a formatting issue is a minute not spent on the architectural questions that actually need a human brain. Workflow automation tools like n8n let teams build custom pipelines that handle linting triggers, test orchestration, reviewer assignment, and Slack notifications without writing a custom CI integration from scratch. Open source and self-hostable, so your code never leaves your infrastructure.

Rethink testing workflows. AI is genuinely good at generating test suites. Teams that use AI to draft comprehensive test cases before the feature code is written are seeing faster stabilisation and fewer regressions.[7] This flips the traditional workflow. Instead of building then testing, teams work against a testing blueprint that AI drafts and a human refines.

Talk about AI-generated code explicitly. One pattern we see in high-performing teams is a cultural norm around transparency. When code is substantially AI-generated, the PR description says so. Not as a disclaimer, but as useful context for the reviewer. Reviewing AI-generated code requires a different mindset than reviewing hand-written code. The bugs are different. The patterns are different. Reviewers who know what they are looking at do a better job. Document these norms somewhere the team actually reads. Notion works well for living engineering playbooks that evolve as your AI adoption matures.

§ Layer 3: Agents and Sub-Agents (The Next Frontier)

How Sub-Agents Work

The first two layers assume a human writes the code, with AI helping. Layer 3 removes that assumption.

AI coding agents do not assist developers. They act as developers. They take a ticket from Linear or Jira, propose a plan, write the code, run the tests, and open a pull request for a human to review. The human's role shifts from writing to directing and reviewing.

Devin, built by Cognition, operates as a fully autonomous software engineer. You assign it a ticket, it proposes a plan, implements it, tests its own changes, and opens a PR. The rule of thumb from its documentation: if a human can do it in three hours, Devin can probably do it.[8] GitHub Copilot's coding agent works similarly, running in the background on an issue and producing a pull request that the whole team can review natively on GitHub.[5]

Claude Code, OpenAI Codex, and Cline all support agentic modes where the AI determines which files to change, runs terminal commands, and iterates until the task is complete. These are not autocomplete tools with a new label. They are autonomous systems that plan, execute, and self-correct.

75% Companies planning to deploy agentic AI within two years
21% Companies with mature governance in place for AI agents

What sub-agents look like in practice

  • A planning agent that breaks a feature spec into implementation tasks
  • A coding agent that writes the implementation across files
  • A testing agent that generates and runs test suites
  • A review agent that checks output for quality, security, and style compliance
  • A documentation agent that updates docs to match code changes

The most sophisticated teams are not running a single agent on a single task. They are composing workflows from multiple specialised agents, each handling a different part of the development lifecycle. Cline already supports splitting tasks across planning and coding roles. Devin integrates with Slack, Linear, and Jira to pick up work autonomously. GitHub's coding agent runs in the background and produces reviewable PRs.

For teams ready to build these composed workflows, the tooling has matured quickly. n8n provides a visual workflow builder where you can chain agents with conditional logic and human approval gates between steps. CrewAI takes a different approach: you define role-based agent crews that delegate tasks among themselves, with built-in tracing of every agent decision. For teams that want full code-level control, LangGraph models your agent pipeline as a state graph with explicit nodes and edges.[11] And if none of those fit, building custom orchestration on top of the Claude Agent SDK, Google ADK, or PydanticAI is a viable path for teams with specific security or compliance requirements.

Deloitte's 2026 State of AI report found that close to three-quarters of companies plan to deploy agentic AI within two years. But only 21% report having mature governance in place.[9] The ambition is well ahead of the guardrails.

Governing this in practice means answering a few specific questions before you deploy.

Define the boundaries before you deploy. Which tasks can an agent handle autonomously? Which require human review before merging? Start with well-scoped, lower-risk tasks: bug fixes with clear reproduction steps, test generation, documentation updates, dependency upgrades. Expand the boundary as trust builds.

Treat agent output like junior developer output. Review it carefully. Do not merge it automatically. The bugs agents produce are different from the bugs humans produce. They are often structurally correct but contextually wrong, missing business logic that was never written down, or solving the wrong problem confidently. Reviewers need to know they are reviewing agent work and adjust accordingly.

Build observability into agent workflows. You need to know what an agent did, why it did it, and what it changed. Every agent action should produce a clear audit trail. If you cannot explain what the agent did to a non-technical stakeholder, your governance is not ready.

Set cost controls early. Agentic workflows burn tokens fast. A single complex task can run through thousands of API calls as the agent plans, executes, tests, and iterates. Without usage limits and monitoring, costs can surprise you. Set per-task and per-day budgets before someone discovers the hard way that an agent spent the weekend refactoring a monolith.

Do not skip the cultural conversation. Engineers have real concerns about agents. Will this replace my job? Will I spend all day reviewing AI-generated code instead of building things? These are legitimate questions. The honest answer is that agents change what engineers do, not whether they are needed. Less writing, more reviewing, more architecture, more directing. The teams where this works are the ones where leadership is transparent about the shift.

§ The Mid-Size Trap

There is a pattern that engineering leaders at growing companies should pay particular attention to.

The smallest organisations move fast because context is shared naturally and everyone uses whatever tools work. The largest enterprises invest heavily in governance, standardised tooling, and dedicated platform teams to manage the complexity. Both groups adapt quickly.

Mid-sized organisations, between 20 and 50 engineers, face the steepest challenge. They have outgrown the natural coordination of small teams but have not yet built the systems that allow larger organisations to operate at scale.

At this size, the three layers collide. Layer 1 is messy because engineers are using five different tools with no shared standards. Layer 2 is strained because review processes designed for three developers cannot handle the volume that AI-assisted code creates. Layer 3 is nonexistent because nobody has time to figure out agent governance when the current sprint is already behind.

AI-assisted development does not create this problem. It exposes it faster.

If your engineering team is in this range, the instinct is to slow down AI adoption until you get organised. That instinct is wrong. The fix is to invest in Layer 2 and Layer 3 deliberately, even if imperfectly, before the volume overwhelms your current ways of working.

§ Where to Start This Week

If your engineering team is using AI tools but not seeing the expected gains, run a quick diagnostic across all three layers.

Layer 1 check: Are your engineers using AI tools daily, and do they have the freedom to choose what works for them? If not, this is the fastest fix. Remove friction, provide budget, make the approved options clear (see the tool landscape above), and stop deliberating over tool selection.

Layer 2 check: What is your average time from PR open to merge? If it is measured in days rather than hours, your review process has become the bottleneck. Schedule daily review blocks, automate first-pass checks with tools like n8n or your CI platform, and have an honest conversation with your team about how AI-generated code should be reviewed differently.

Layer 3 check: Have you experimented with any agentic workflows? If not, start small. Pick a well-scoped, lower-risk task type, like test generation or documentation updates, and try running it through an agent like Devin or GitHub Copilot's coding agent with human review. Set a cost budget. Measure the quality. Build trust incrementally rather than waiting for a strategy.

The teams capturing the full value of AI-assisted development are not the ones with the best tools. They are the ones that recognised the three layers and invested in each one deliberately. That is the playbook.

A note from fusecup

At fusecup, we work with engineering leaders who are navigating exactly this shift. If you are seeing the throughput numbers climb but the shipped output stay flat, or if you are trying to figure out which layer needs attention first, we are always happy to talk it through. No agenda, no pitch. Just a practical conversation about what might work for your team right now.

§ References

  1. CircleCI. 2026 State of Software Delivery. Based on analysis of over 28 million CI workflows across thousands of teams. circleci.com
  2. Waydev. More Code, Fewer Releases: The Engineering Leadership Blind Spot of 2026 (March 2026). waydev.co
  3. Faros AI. Best AI Coding Agents for Developers in 2026 (January 2026). faros.ai
  4. Pragmatic Engineer. AI Tooling for Software Engineers in 2026. newsletter.pragmaticengineer.com
  5. GitHub. About GitHub Copilot coding agent. docs.github.com
  6. Yahoo Finance / Engineering Benchmarks. AI Helps Low-Performing Engineering Teams 4x More Than High-Performing Ones. finance.yahoo.com
  7. The Big Pixel. The Best AI Coding Practices That Actually Work in 2026. thebigpixel.net
  8. Cognition. Introducing Devin. docs.devin.ai
  9. Deloitte AI Institute. State of AI in the Enterprise 2026 (January 2026). deloitte.com
  10. ShiftMag. This CTO Says 93% of Developers Use AI, but Productivity Is Still 10%. shiftmag.dev
  11. LangChain. LangGraph: Multi-Agent Workflows. blog.langchain.com