The Engineering Metrics That Actually Drive Better Decisions

Technology Operations 14 min read by Girish Koliki
The Engineering Metrics That Actually Drive Better Decisions

Most engineering teams either measure nothing or measure the wrong things. The fix is not more dashboards. It is three metrics, applied with discipline, used to improve systems rather than judge people. Here is how to get it right without turning your team into a surveillance state.

Here is the uncomfortable truth about engineering metrics: most teams get them completely wrong, and the ones who get them wrong are often worse off than the ones who track nothing at all.

In teams with no metrics, decisions get made on gut feel, loudest voice, or whatever the last incident was. That is not great. But in teams with bad metrics, something more insidious happens. People start optimising for the number instead of the outcome. They game the system. They close tickets faster by making them smaller. They deploy more often by shipping half-finished work. The dashboard looks brilliant. The product gets worse.

The good news is that the space between these two failure modes is well understood. Google's DORA research programme, which has studied tens of thousands of engineering teams since 2014, has consistently identified a small set of metrics that correlate with both technical performance and organisational outcomes.[1] You do not need to track everything. You need to track the right things, for the right reasons, in the right way.

§ Why Most Teams Either Track Nothing or Track the Wrong Things

There are two common patterns, and both are understandable.

The first is the team that has never had metrics. Usually this happens because no one has the time to set them up, or because a previous attempt went badly and left a bad taste. Someone introduced story points as a productivity measure, it became a stick, and the team collectively decided that measurement itself was the problem. Fair enough. But it was not measurement that failed. It was the choice of what to measure and how to use it.

The second is the team drowning in dashboards. Lines of code. Pull request counts. Story points completed. Velocity charts projected onto screens in the office like a stock ticker. As one engineering analytics firm put it, using story points to rank teams is like running fantasy football standings for your engineering organisation.[2] It feels data-driven. It is not. These are vanity metrics: numbers that move, but do not tell you anything useful about whether your team is actually getting better at delivering software.

The problem with vanity metrics is not just that they are unhelpful. It is that they actively distort behaviour. When you measure lines of code, people write more code. When you measure pull request count, people make smaller, more fragmented pull requests. When you measure individual output, people stop collaborating because helping a colleague does not show up on their scorecard. The metric becomes the goal, and the actual goal gets lost.

§ Three Metrics to Start With (And What Each One Actually Tells You)

If you are starting from zero, or starting again after a failed attempt, begin with three metrics. They come from the DORA framework and a decade of cross-industry research. They are simple to collect, hard to game in isolation, and genuinely useful for making decisions.[1]

1. Cycle time

Cycle time measures how long it takes for a code change to go from first commit to running in production. It captures everything: development, code review, testing, deployment, and any waiting time in between.

What it actually tells you: where your bottlenecks are. A long cycle time almost never means your developers are too slow. It usually means work is sitting in queues, reviews are taking days, environments are broken, or the deployment process has too many manual gates. Cycle time is a system metric, not a people metric. When it improves, it means the system around your team is getting better.[3]

2. Deployment frequency

Deployment frequency measures how often your team ships code to production. DORA defines it as the number of successful releases over a given period, typically measured as daily, weekly, or monthly.[4]

What it actually tells you: how confident your team is in their release process. Teams that deploy frequently tend to ship smaller changes, which are easier to review, easier to test, and easier to roll back if something goes wrong. Low deployment frequency is usually a sign that releasing is painful, risky, or both. It is a proxy for the health of your entire delivery pipeline, not a measure of how hard people are working.

3. Incident rate (and recovery time)

Incident rate tracks how often things break in production. Paired with mean time to recovery (MTTR), which measures how quickly the team restores service after a failure, it gives you a picture of your system's reliability and your team's ability to respond under pressure.[5]

What it actually tells you: the quality of what you are shipping and how resilient your systems are. A high incident rate paired with a low MTTR suggests the team is shipping too fast with too little testing, but recovering well. A low incident rate with a high MTTR suggests the system is stable until it is not, and when it breaks, the team struggles. Both patterns point to different, specific improvements.

These three metrics work as a set. Cycle time without deployment frequency can be gamed by pushing incomplete work. Deployment frequency without incident rate rewards speed at the expense of quality. Track them together and they keep each other honest.

§ How to Introduce Metrics to a Team That Has Never Had Them

This is where most rollouts fail, and it has nothing to do with the metrics themselves. It has everything to do with how they are introduced.

The single most important principle: start with visibility, not targets.

When you introduce metrics with targets attached, the team hears one thing: we are being watched, and we will be judged. That triggers exactly the behaviours you are trying to avoid. People optimise for the number. They find shortcuts. They stop taking risks because risk means the number might go the wrong way. You have just created the surveillance culture you were trying to avoid.

Instead, start by simply making the data visible. Put the cycle time chart on a shared dashboard. Let the team see deployment frequency over time. Do not set goals. Do not attach the numbers to performance reviews. Just let people look at the data and start asking their own questions.

What tends to happen next is genuinely interesting. Engineers are curious by nature. When they can see that their cycle time spikes every other Thursday, they start asking why. When they notice deployment frequency dropped for two weeks, they start connecting it to that infrastructure migration that clogged the pipeline. The insights come from the team, not from management, and that changes the entire dynamic.

The research supports this. Non-invasive engineering analytics draws a clear line: measure work outputs, not work behaviour.[6] Commit frequency and pull request cycle time are acceptable. Keystroke logging and active time tracking are not. The distinction matters because it is the difference between helping people do better work and watching them do work.

Only after the team has lived with the data for a few weeks, started their own conversations about it, and begun identifying their own improvements should you consider setting any targets. And when you do, set them collaboratively, as team goals, never as individual ones.

§ Common Mistakes That Poison the Well

Even with the right metrics and the right introduction, there are patterns that reliably derail things.

Vanity metrics that feel good but mean nothing. Lines of code is the classic example, but pull request count is equally misleading. A developer who writes one pull request that solves a complex architectural problem and a developer who opens fifteen small cosmetic fixes are contributing very differently. If you measure the count, you reward the wrong one.

Gaming. Any metric that is tied to individual evaluation will be gamed. This is not a character flaw. It is human nature. If cycle time is tied to someone's performance review, they will find ways to reduce it that have nothing to do with genuine improvement. They will split work into trivially small pieces. They will skip thorough code reviews. They will deploy changes that are technically complete but not genuinely ready. The metric improves. The software does not.

Using metrics to judge individuals instead of systems. This is the most corrosive mistake. Cycle time, deployment frequency, and incident rate are system-level metrics. They tell you about the health of your processes, tools, and organisational design. They do not tell you whether a specific engineer is doing good work. A developer on a team with a slow CI pipeline will have a longer cycle time than one on a team with a fast pipeline. That is a system problem, not a performance problem. The moment you start using system metrics to evaluate individuals, you destroy psychological safety, and with it, any chance of honest improvement.

Measuring too many things at once. As one engineering leadership guide put it: most advice about engineering metrics treats them like a shopping list.[7] Pick DORA, add some SPACE dimensions, throw in some cycle time, done. But more metrics does not mean better insight. It means more noise. Start with three. Get good at using them. Add more only when you have a specific question that the existing metrics cannot answer.

§ A Simple Framework for Turning Metric Insights into Leadership Decisions

Metrics are worthless if they do not change what you do. Here is a straightforward framework for turning the data into action.

Step 1: Observe the trend, not the number. A cycle time of five days means nothing in isolation. A cycle time that has gone from three days to five days over the past month tells you something is changing. Focus on direction, not absolute values. Compare your team to its own history, not to some industry benchmark that may have nothing to do with your context.

Step 2: Ask the team what they see. Before you draw conclusions, bring the trend to the team. Ask: what do you think is happening here? Nine times out of ten, they already know. The deployment pipeline got slower after the security audit. The code review backlog grew because two senior engineers went on leave. The incident rate spiked because of that third-party API change nobody anticipated. The team's explanation is almost always more accurate than any hypothesis you can form from the dashboard alone.

Step 3: Identify the systemic cause. Resist the temptation to attribute the trend to individual effort. If cycle time increased, ask: what changed in the system? Did we add a new approval step? Did the test suite get slower? Did the team take on a new type of work that is inherently more complex? The answer is almost always structural.

Step 4: Make one change and watch the metric. Do not try to fix everything at once. Pick the most impactful systemic cause, make a targeted change, and watch whether the metric moves in response. This is the engineering approach to leadership: form a hypothesis, run the experiment, measure the result.

Step 5: Communicate what you learned. Share the insight, the change, and the result with the broader team. This does two things. It shows that the metrics are being used to improve the system, not to judge people. And it builds the habit of data-informed decision-making across the organisation, not just at the leadership level.

§ Where to Start

If you are reading this and your team currently tracks nothing, here is the smallest useful first step: set up a cycle time dashboard. Most modern development tools, from GitHub to GitLab to Jira, can generate this with minimal configuration. Make it visible. Do not attach any goals to it. Just let the team see the data for two weeks.

If you already track metrics but they are not driving better decisions, do an honest audit. For each metric you currently collect, ask: has this metric changed a decision we made in the last quarter? If the answer is no, it is noise. Remove it. Simplify until every number on the dashboard has earned its place.

The goal is not to become a data-driven engineering organisation overnight. The goal is to be slightly more informed tomorrow than you were today, and to build the habit of using that information to improve systems rather than evaluate people. That is a goal worth measuring.

A note from fusecup

At fusecup, we help engineering leaders build teams and processes that work. If you are thinking about how to introduce metrics, improve your delivery pipeline, or simply want a second opinion on how your engineering organisation is performing, we are always happy to talk it through. No agenda, no pitch. Just a conversation about what might work for where you are right now.

§ References

  1. DORA Team (Google Cloud). DORA's Software Delivery Performance Metrics. dora.dev
  2. Appfire. The Engineering KPI Trap: What to Measure (and What to Ditch). appfire.com
  3. Towards Data Science. Top 10 Metrics for Engineering Teams (2022). towardsdatascience.com
  4. Cortex. Deployment Frequency: Why and How to Measure It. cortex.io
  5. Stack Overflow. The Four Engineering Metrics That Will Streamline Your Software Delivery (2021). stackoverflow.blog
  6. CodePulse HQ. Non-Invasive Engineering Analytics: Measure Without Surveillance. codepulsehq.com
  7. Swarmia. Engineering Metrics Leaders Should Track in 2026 (and What to Do with Them) (January 2026). swarmia.com