How to Diagnose an Engineering Organization with Data

6 minute read

A while back I did a consulting engagement for a LATAM payments fintech — an organizational diagnosis of their engineering area. I’m sharing it because I believe the approach is replicable — it doesn’t depend on the company or the team size. If you lead engineering at a fintech or startup, you’ll probably recognize several of these patterns.

Names, data, and details have been changed. What’s valuable here is the approach: how to analyze the data, what to prioritize, and why.

This is the first of three posts. Here’s the diagnosis. Next up: metrics strategy and a 90-day execution plan.

The scenario

A payments orchestration platform for marketplaces in LATAM. Marketplaces integrate an API to handle split payments, seller payouts, escrow, and compliance (tax withholding, anti-money laundering).

The problems:

Client deadlines slipping
Too much firefighting from incidents
Unpredictable cycle times (how long a ticket takes from start to delivery) across squads
Stakeholders asking for more visibility
CTO needing metrics and a clear execution system

Four squads:

Squad	Purpose	Size	Seniority
Payments Core	Payment orchestration + split engine	4 eng	1 Sr, 2 Mid, 1 Jr
Payouts	Seller disbursements + reconciliation	2 eng	1 Sr, 1 Mid
Onboarding	Seller KYC + marketplace integration	3 eng	2 Sr, 1 Mid
Compliance	AML pipeline + tax withholding	2 eng	1 Sr, 1 Mid

Key terms:

Flow efficiency: ratio of active work time vs waiting time

Cycle time P75: how long 75% of tickets take to complete

CFR (Change Failure Rate): percentage of deploys that cause an incident

GMV (Gross Merchandise Volume): total transaction volume processed

MTTR (Mean Time to Restore): average time to bring a service back up

Diagnosis summary

Squad	Status	Main problem	Action
Payouts	Critical	13% flow efficiency, 23% CFR on Payout Service	Process + stabilize
Payments Core	At risk	27% flow efficiency, 9% CFR on Split Engine	Improve flow
Onboarding	Stable	44% flow efficiency, <7% CFR	Maintain
Compliance	Stable	32% flow efficiency, <5% CFR	Maintain

Identifying the bottlenecks

Payouts is the main bottleneck.

Flow efficiency at 13% — for every hour of code, almost 7 hours waiting on code reviews, dependencies, and blockers.
Cycle time P75 of 19.7 days — most work doesn’t fit in a sprint.
CFR of 23% on the Seller Payout Service — nearly 1 in 4 deploys breaks something.

Payments Core is the hidden problem.

Flow efficiency at 27% and cycle time P75 of 10.2 days — it doesn’t raise alarms because Payouts is worse.
But Core moves $9.3M monthly in GMV between the Split Engine and the Payment Gateway API.
Any degradation there has a disproportionate impact.

Onboarding works.

44% flow efficiency and controlled cycle times.
Doesn’t need immediate intervention.

Compliance is also stable.

32% flow efficiency, low CFR, reasonable cycle times.

Calculating Revenue at Risk

To prioritize with data instead of intuition:

Revenue at Risk = (Deploys/week × CFR) × (GMV/week) × Severity × (MTTR/168)

(Deploys/week × CFR) = expected incidents per week
(GMV/week) = weekly processed value (monthly GMV / 4)
(Severity) = impact weight (High: 1.0, Medium: 0.5, Low: 0.2)
(MTTR/168) = average restore time divided by 168 hours in a week

This is a simplified version. In reality there are more variables — recoveries, fallbacks, retries, queues — that can mitigate or amplify the actual impact.

Example: Seller Payout Service

Deploys/week: 1.2
CFR: 23% → expected incidents = 1.2 × 0.23 = 0.276/week
GMV/week: $1.8M / 4 = $450K
Severity: High (1.0)
MTTR: 6.5 hours → 6.5 / 168 = 0.0387

Revenue at Risk = 0.276 × $450K × 1.0 × 0.0387 ≈ ~$4.8K/week

It’s not an exact number — it’s a heuristic for prioritization. But it shifts the conversation from “the payouts service has bugs” to “Payouts puts ~$19K/month in revenue at risk.”

By service

Service	Squad	Monthly GMV	Severity	Revenue at Risk/week
Seller Payout Service	Payouts	$1.8M	High	~$4.8K
Reconciliation Engine	Payouts	$1.8M	High	~$2.3K
Split Engine	Payments Core	$4.2M	Medium	~$1.2K
Payment Gateway API	Payments Core	$5.1M	Medium	~$0.6K
Marketplace Connector	Onboarding	$1.1M	Low	~$0.05K
AML Pipeline	Compliance	$3.4M	Low	~$0.08K
Tax Withholding Service	Compliance	$0.6M	Low	~$0.02K
KYC Flow	Onboarding	$0.7M	Low	~$0.01K

Payouts accumulates ~$7.1K/week in Revenue at Risk — the most critical squad by far. Payments Core adds ~$1.8K/week, but with $9.3M in monthly GMV, any degradation in CFR or MTTR scales fast.

Engineering Leverage

Leverage = Revenue generated / Engineering cost.

Costs are estimated from team size and market-average salaries for each seniority level. Incremental revenue comes from the finance team.

Squad	Weekly cost	Incremental revenue	Leverage
Payments Core	$18k	$67k	3.7x
Onboarding	$14k	$38k	2.7x
Compliance	$10k	$24k	2.4x
Payouts	$10k	$7k	0.7x

Leverage isn’t the only metric that matters, but it’s the one that aligns engineering with finance the fastest. When the CFO asks “why do we need more headcount?”, this number answers.

Payouts has leverage below 1 — it costs more than it generates. This isn’t the team’s fault. It’s 2 people spending 57% of their time in reactive mode (26% bugs + 31% client requests). It’s a process and staffing problem.

Misaligned Time Allocation

Payouts spends only 29% on roadmap (should be >50%) and 57% on reactive work. There’s no intake process — everything comes in unfiltered and unprioritized.

What to escalate to the CTO (and how)

Payouts with negative leverage (0.7x) — spending more than it generates, ~$7.1K/week in Revenue at Risk
Seller Payout Service unstable — 23% CFR, High severity, blocks seller disbursements
Payments Core is bad but hidden — $9.3M in monthly GMV with 27% flow efficiency
Client commitments at risk — cycle time P75 of 19.7 days in Payouts is incompatible with SLAs (the service level agreements promised to clients)

First 4 weeks

Step 1: Fix Payouts. Intake process for client requests. Protect at least 50% of time for roadmap. Identify root causes of blockers.

Step 2: Stabilize Seller Payout Service. Review architecture and test coverage. Feature flags for fast rollback. Deploy freeze on Fridays.

Step 3: Investigate Payments Core. Map what’s driving the 27% flow efficiency. Slow reviews? Cross-service dependencies? Unclear ownership?

In parallel: Evaluate headcount for Payouts — what profile, context, and justification.

How to replicate this

If you want to run a similar diagnosis:

Map squads → services → GMV. Without this analysis linking engineering to business impact, you can’t measure impact.
Measure flow efficiency and CFR per squad. You don’t need sophisticated tooling — Git history + issue tracker is enough. But this is where an engineering leader who codes shows their real strength: getting into the code, understanding what breaks, how everything works, how long it takes to deploy something to production. It’s important to identify problems, blockers, and flows that actually work.
Calculate Revenue at Risk to prioritize with data. You can start with the simple formula I propose here and refine it over time based on your team, goals, context, and other variables that change along the way.
Present leverage as an investment argument, not as a performance evaluation. This metric works as a guide for where to move: where you can generate more impact or where you need to strengthen the team.

This is a first approach with limited context. Once you’re inside the organization studying the teams, services, and day-to-day up close, you can go much deeper and refine the decisions made. But as a starting point, this kind of diagnosis already gives you direction.

The diagnosis is about connecting engineering data to business decisions. When you can say “this squad has negative leverage and this service puts ~$28K/month at risk,” the conversation with the CTO changes completely.

Next post: how to build the metrics system that sustains this kind of analysis over time.

If you want to dig deeper into the metrics I used as reference, check out the DORA framework.

Twitter Facebook LinkedIn

How to Diagnose an Engineering Organization with Data

The scenario

Diagnosis summary

Identifying the bottlenecks

Calculating Revenue at Risk

Example: Seller Payout Service

By service

Engineering Leverage

Misaligned Time Allocation

What to escalate to the CTO (and how)

First 4 weeks

How to replicate this

Comments

You May Also Enjoy

Diagnosticar una organización de ingeniería con datos

Mi POC con MCP: Conectando Claude con una API de Prueba

My POC with MCP: Connecting Claude with a Test API

Construyendo dataspot: Lecciones desde la Detección de Fraude Real