Episode #1: Going On an Adventure! The First Step
[Earlier this year I had a fantastic guest post from Mojtaba Hosseini, a Director of Engineering at Zapier, talking about the 12 pitfalls of engineering metrics. As Mojtaba is guiding his current company through their journey of becoming a more metric-driven engineering organization, he’s reflecting back on his previous experiences around metrics in the hope that those lessons learned can help inform the journey at Zapier (and hopefully help you, too 😄)! So grab your favorite snack and settle in for the first flashback episode in Mojtaba’s metrics series 🍿
The One Where Questions Lead to a Shocking Metric
When I first joined my previous engineering organization, we had no metrics at all. None! I had recently read the book Accelerate and so the 4 DORA metrics were very top of mind. I figured we would start with 2 of the easier ones to measure (focused on velocity), leaving the more difficult ones (focused on quality) for later:
- Lead Time: Time between an improvement being done (or scoped) and it being in production
- Deployment Frequency: How often such improvements make it to production
Easy enough! Right?
Chasing Lead Time: Part 1
So I started to ask when the software engineering team deployed into production. The conversation went something like this (dramatized for effect):
Me: “So how often do we deploy?”
Dev Team: “We release our software every 2 weeks at the end of each sprint.”
Me: “Oh that’s great! So what happens when you release every 2 weeks?”
Dev Team: “We start another sprint!”
Me: “No. I mean what happens to the release of the completed sprint?”
Dev Team: “Not sure. Talk to our Operation team.”
😬
Interlude: Some Context
It turns out I had joined an organization that was very early in its DevOps journey. There was a (software) Dev team with its own management chain, and an Ops Team with its own parallel management chain going up to the General Manager (VP of the division). Well this will be fun!
Chasing Lead Time: Part 2
So here’s how my conversation went with the Ops Team (dramatized for effect):
Me: “So how often do we deploy?”
Ops Team: “Well, we used to deploy releases as soon as they became available.”
Me: “Used to?”
Ops Team: “Well, several months ago, the Dev Team released a version of the software that forced a major Ubuntu upgrade on our VMs that broke all our scripts. We are a single-tenant application and so have one VM per customer. This means having to patch many many VMs with the updated scripts and doing the major upgrade. We’ve been working on a large upgrade script to do this for some time now.”
Me: How long?”
Ops Team: “Couple of months. We do the script development on the side. Our full time job is taken up by regular day to day ops stuff (security updates, new VMs, new customer installations, etc) as well as supporting our field operations team.”
Me: “So when was the version currently in production released?”
Ops Team: “Oh… about a couple of months ago.”
😬
Lead Time Shocker
So it turned out the Lead Time for this engineering organization was 75 days and counting! The Dev Team happily released software every 2 weeks, but the Ops Team hadn’t deployed anything to production for 75 days and counting! This also meant that deployment frequency was essentially 0! (For comparison, organizations in ‘elite’ category deploy 10s or 100s of times per day and have lead times measured in minutes and hours.)
On the Next Episode
Stay tuned for the next episode where we showed the Lead Time “graph” to the executives and hilarity ensued. (Narrator: it did not!)
Lessons Learned
There were a few key takeaways for me here:
- Metrics are a means to an end. They help us ask questions.
- Measuring something, almost anything, is better than not measuring anything. That is - taking a baby step, even if it’s not perfect, is a good way to start. We didn’t try to capture all 4 DORA metrics, or to automate their collection. We started with baby steps.
- Metrics can dramatize and accentuate what is already known. Both the Dev and Ops teams knew of the problem but to say “our deployment frequency is 0” and “lead time is at 75 days and counting!” really crystalised the problem.
- (the right) Metrics can help showcase outcomes that come together with the help of multiple teams. “Deployment frequency” captures many aspects of the velocity of the software development life cycle (from design to development, testing to CI/CD). Even if there are multiple people/teams that own pieces of the life cycle, this metric (and others) rally them around a set of numbers to improve together.
If you’re on the edge of your seat waiting to see how the executives reacted to the Lead Time metrics - don’t worry! Episode 2 of our Managing Metrics series has just dropped 💥
[ Guest post from Mojtaba Hosseini ]