I recently came across the tweet below, which is certainly one I can relate to. Every startup that I know is facing a dire shortage in engineering talent. But, is hiring the only way out of this conundrum?
Source: Twitter
I recall a few years ago during an annual planning meeting during which I presented the hiring plan for the engineering organization, which at the time was ~40. The plan called for a sizable increase in headcount to try and keep up with the demand for new features. The CEO looked at my presentation and asked me if I am hiring because I “maxed out the productivity of the current team” While the answer to this question is almost always “no’, it did trigger a healthy conversation. How do you measure the productivity of an software engineering organization and is hiring the only way to increase productivity? I believe the answers to both of questions are a) it’s complex and b) absolutely not in that order.
It is worth noting that this problem of productivity isn’t unique to my experience. A 2018 study - The Developer Coefficient - conducted by Stripe revealed that the issue of developer productivity is widespread and one of the major challenges for many companies.
“While many people posit that lack of developers is the primary problem, this study— which surveyed thousands of C-level executives and developers across six different countries—found that businesses need to better leverage their existing software engineering talent if they want to move faster, build new products, and tap into new and emerging trends. “
Vanity productivity metrics
Measuring the productivity of a software engineering organization is hard, but it is not impossible. Over the years, I have seen (and used) terrible metrics of which I present a few below.
Lines of Code (LoC) Since software engineers are primarily concerned with writing code, then measuring how many lines of code they write over a period of time should be a proxy for their productivity. That couldn’t be further than the truth. The first and obvious reason why this is a terrible metric is not having a standardized definition for a line of code. Second, an increase in code-base size isn’t really indicative of anything. In fact, often times shrinking your code-base is what you really ought to be doing. For example, you might be refactoring poorly written code which will could result in a smaller code-base. I recall once being asked to provide this metric to potential investors. The investors were a bit perplexed when I showed them that our code-base had shrunk over the past year in-spite of delivering numerous critical features. We had ripped out a decent chunk of dead code and refactored quite a bit of our code-base resulting in a the code-bases remaining flat over 1 year.
Number of resolved tickets. That’s another one I’ve seen used over the years. Whilst it isn’t as terrible as LoCs, it also suffers from a lack of standardization. There is no universal definition of what a ticket is. Nor is it easy to compare two tickets.
Size of backlog. The theory with this metric is that a growing backlog is a sign that the development team cannot keep up with demand. The issue with this metric is that a backlog represents the possibility of everything that the development team could build in the totality of time. Anyone (arguably just the PMs, but bear with me) can create a ticket and add it to the backlog. That doesn’t necessarily mean that this ticket will ever be implemented. Priorities will change, so will business conditions. There will always remain a subset of the backlog that will never see the light of day. I have personally been involved in no less than 3 Jira “cleanups” whereby tickets created before a certain date were simply killed or moved to a graveyard backlog.
The issue with all of these metrics is that they aren’t insightful nor are they actionable. A growth in lines of code doesn’t mean anything. It could be due to a legitimate reason like adding a new feature or a sign of a problem due to poorly written code. You just don’t know the underlying reason just by looking at the metric. Hence it is useless. At best it gives you the illusion that you are data driven and have metrics to track your organization. I call those vanity metrics.
Flow as a measure of productivity
One concept that I have come across recently is that of flow. Mihály Csíkszentmihályi has written extensively about the state of flow and its relationship to a happy and fulfilling life. According to Wikipedia,
“… flow, also known as the zone, is the mental state of operation in which a person performing an activity is fully immersed in a feeling of energized focus, full involvement, and enjoyment in the process of the activity.”
I have since come to believe that in order to get the most out of our engineering team, you have to provide them an environment that fosters this sense of flow. It follows then, that what you want to measure are all the elements that stand in the way of getting to flow. One of the most critical elements that lead to flow is uninterrupted time, which in my experience is the main obstacle standing in the way of engineering productivity.
Let me illustrate this with an example. I’ll make a somewhat simplistic assumption that your average engineer spends her time in one of the following activities. She codes/designs, conducts interviews, attends meetings, responding to escalations, works on feature planning and prioritization or is idle. The latter typically being a function of waiting for code to build, tests to run or being blocked by some external dependency.
The diagram below shows how a typical software engineer - let’s call her Urszula - spends her time. Urszula’s time is often interrupted and she rarely gets a chance to have a long block of uninterrupted time. She is oftentimes pulled into meetings. On other days she spends the majority of her time interviewing candidates. Every so often she will be pulled into a customer escalation which requires her to drop whatever it is she is doing.
In an ideal world, Urszula’s day should be structured as shown below. A quick morning sync up with her team followed by uninterrupted time for her to build software.
Getting to flow
In order to get Urszula’s days to look like the optimal day we depicted above, we will have to eliminate all the distractions and obstacles that appear in non-optimal days. As a reminder those ranged from meetings, interviews, planning, legacy code and being idle. Below are a few antidotes I’ve learned and applied through the years that can help you combat these obstacles
Antidote 1: Testing
Whether you are building a brand new product or building on top of existing technology, you will have to deal with legacy code. The answer is to ensure that any code you develop is properly tested, which is easier said than done. A rigorous attention to quality allows you to be nimble and develop high quality software quickly. Good testing also offers a myriad of benefits ranging from
Builds quality in early
Prevents future bugs and regressions (bug repellent)
Reduces risk of future changes
Helps localize defects and accelerate resolution
Documents the expected behavior of the system (System Under Test principle)
Gives quick feedback as to quality: allows you to develop quickly and confidently
Whilst this article isn’t about testing, I will offer a few two guidelines that I believe are must-haves for good testing hygiene.
First test at the lowest level possible. That is, use unit tests whenever possible, small integration tests when necessary, large integration tests when absolutely necessary, and full system tests as a last resort. Lower-level tests help localize defects and speed resolution — when the tests fail, the bad code is limited to a smaller set of lines. They also run more quickly, so that gives quick feedback and makes us engineers much more productive.
Second, make sure your tests are repeatable. There’s no point in writing tests that are flakey and fail too often in non-deterministic ways. You want tests that are predictable. When they fail they indicate an error in your code.
An anti-pattern I am always on the lookout for is hearing a developer say “It’s too difficult to write automated tests against this code, we will have to manually/chaos test it”. That is almost always an indication of poorly factored code. A well factored and well designed piece of code should always be testable and more critically makes it easy to build software on top of. Well factored and well tested code is the best antidote for dealing with legacy code (at least your own legacy code).
Testing isn’t free. You will have to invest in writing unit and system level tests. You will also have to invest in the underlying infrastructure that supports these tests. The investment is absolutely worth it. It will pay 10x the dividends both in terms of higher productivity and happier customers. The latter also results in higher productivity due to fewer escalations.
Antidote 2: Planned interruptions
Let’s face it interruptions will happen irrespective of the efforts you put in place to minimize them. Therefore, you should plan for dealing with them by building shock (or interruption) absorbers for your team. The interruptions that I have found are difficult to plan for are customer escalations and interviews. It’s highly unlikely that you can know in advance how many customer escalations you will be dealing with next week. Similarly, interviews tend to ebb and flow, although they can be predicted easier than escalations, at least over short time horizons.
To help deal with these two interruptions I have always relied on building two (volunteer) based teams. One team would deal with escalations. The other would deal with interviews. The assignments to these teams are temporary and lasted 90 days in the case of the escalations team and one week for interviews.
The escalations team would work with our support organization to deal with any critical customer escalation. In principle nothing should leak from this team to the general engineering population, and if it does we measure that and understand why the support or escalations team weren’t able to resolve it without the issue leaking through the org. A side benefit of passing on escalations to the development team is how it drives better quality, testing and building with debug ability and observability in mind. That in turn results in higher quality code, fewer escalations and higher productivity
For interviews I had multiple interviewing squads. Each squad consisted of engineers who can handle the weekly load of interviews. If you were a member of this squad, you knew ahead of time that your week was highly interruptible. Moreover, I worked with my recruiting team to try and schedule the majority of the interviews on one or two days of the week, to help build predictability in the interruptions.
In general, if you have to deal with interruptions try and isolate them to as few individuals/teams as possible. Build your interruption shock absorbers.
Antidote 3: Scheduled planning and prioritization
I recently came across Basecamp’s ShapeUp framework and decided to give it a try at Kheiron. This framework outlines how features need to be shaped and pitched before they are committed for development.
The shaping stage fleshes out a feature by adding sufficient context in terms of scope, high level design, cost and benefits. All shaped features are then pitched before some/all are selected and assigned to development teams for implementation. The maximum time allotted to a selected feature is 6-weeks, with no guarantee that the feature can get further funding once the 6-week cycle it is allotted to ends. Once a 6-week cycle ends, the development teams enter a 2-week cool-down cycle during which they can wrap up any lingering work from their previous project, explore new ideas and so forth. Meanwhile, during that period the next set of features will be shaped and a few of those will be selected for the next 6-week cycle.
This framework offers a few benefits that I thought were worth the effort to try it out. The first is how it decouples planning, scoping and risk assessment (shaping) from development. Both activities can be done in parallel as shown in the diagram below. The development of previously prioritized features is done by development teams, whereas shaping of what to work on next is handled by product managers and senior engineers, typically tech leads. This has the added benefit of isolating the development teams from thinking of what to work on next - that is the job of the shapers.
The other benefit is capping the time commitment to 6-weeks with no guarantee of further investments beyond the allotted cycle. This caps the risk to no more than 6-weeks and also forces the development team to continuously assess new risks, adjust scope, evaluate tradeoffs with the ultimate goal of getting their feature (or a functional subset of it) completed before the cycle ends.
We’re in the middle of our first 6-week cycle at Kheiron, so time will tell if the potential benefits of this framework are realized. For now, it is looking quite promising!
Antidote 4: Invest in your infrastructure
I cannot stress the importance of this. Investing in continuously improving your infrastructure, especially around tools that help engineers be more productive is critical. It is often one of the most under-looked areas of a software development organization, especially in the early days of a company. Having an infrastructure team that maintains and enhances the systems and tools that engineers use also allows you to scale your engineering team. Without proper investment on your infrastructure, it will likely be a bottleneck as your team grows. The most common areas that end up being bottlenecks are the build and test time. Those tend to grow linearly with the number of engineers on our team. They will typically manifest by having your engineers sit idly waiting for test results to come back.
You won’t really know. The litmus test is whether or not your teams are hitting their goals, feeling energized and engaged. Software development is a creative process, it is not like building widgets on an assembly line.
What then to measure then?
The following are some metrics that I believe are good proxies for flow and hence productivity.
CI/CD Time: This measures the time it takes to build, test and potentially deploy your software. An increase in this time isn’t necessarily an indication of a problem, it does however have to be justified and understood. In my experience what tends to happen is that this cycle time is not measured until it becomes problematic and your development team starts complaining about build times taking forever. Measure this and monitor it. Understand what led to changes and act on anti-patterns quickly.
Cycle Time: This metric measures the amount of time from work started to work delivered. It is critical to define and standardize on what you mean by “started”and ‘delivered”. The starting point is arguably at the planning and scoping phase. While delivered is when the feature is readily available for your customers to use i.e. shipped/released. If you are using ShapeUp, this time should be a multiple of 6-weeks.
Interviewing Metrics: Interviewing time/week/engineer and funnel metrics. The latter are important to help you understand if you have an inefficient interview structure. For example, if you notice that very few candidates pass your technical screen, this might indicate that the need to adjust your screen.
Escalation Metrics: Even if you have a buffer team that handles escalations, you want to measure the frequency and amount of escalations you get. More importantly you should root cause every escalation and only close it when adequate testing has been added to your tests. An escalation is a testing gap - close it.
Engagement: Surveying your team every quarter is an excellent way to measure their sense of engagement and happiness. A team that is this state of flow and delivering high quality software that delights customers should be happy, whilst the converse is true.
Shaped:Development: This is a new one I added to my repertoire and it is meant to measure the ratio of projects that were shaped relative to the ones that were selected for development within a 6-week cycle. When the ratio is > 1 it implies that you have shaped more work than were able to deliver, which is often the case. However, if the number gets unreasonably high, it implies that you are understaffed and need more developers on your team. Remember, that you start with a clean slate of shaped features at the end of every cycle - so you are not carrying over a growing backlog.
Last, increasing your team’s productivity and hiring are not orthogonal activities. You can do both. Do know that if you have lots to do on the productivity front that you are only compounding the problem by hiring without fixing the underlying productivity issues.
This feeling of flow and getting in the zone is as much science as it is art. You will know when you have flow. I once worked with a VP of Sales who used to often tell me “we’re not in the zone yet.” Eric knew when his team was in the zone and as such as firing on all cylinders. It’s arguably easier to know this for a sales team since they have more qualitative metrics to track (sales figures), but that feeling of being in the zone is all too true.
So do what you can to get your team into the zone and stay in that zone!