Better Metrics for Build Performance Measurement
While doing an architecture refactor recently, I was making large-scale code changes frequently and found the Android build speed had become utterly unbearable. I remember back when I was using an Intel-chip MacBook Pro, a full build took about 40 minutes. After a deep dive, I discovered the real culprit wasn’t the project itself – it was the security software. A fully-specced MacBook Pro was performing like a MacBook Air. Then Apple M1 came along and build speed improved by an order of magnitude. But lately, it’s felt noticeably slower again. I was puzzled – am I really the only one who thinks it’s slow?
User Research
I’d previously hit a Gradle cache issue that doubled build times – clearing the cache fixed it. But this time, clearing the cache changed nothing. A full build still took around 20 minutes. With only 8 hours in a workday, that’s enough for just a handful of full builds. I’m not one to slack off, so I surveyed a few colleagues. Everyone agreed it was slow – but tolerable. Why? Because it used to be 40 minutes, and 20 is already twice as fast! No comparison, no pain – your outlook depends entirely on your frame of reference.
Initial Investigation
The slow builds weren’t isolated to me, but I needed real data. Using git commit history, I estimated that each engineer spent roughly 1 hour per day on builds. The estimation method:
- Total build time = full build time + incremental build time
- Full build time = number of full builds * time per full build
- Incremental build time = number of incremental builds * time per incremental build
Key data points:
Full build frequency
We can’t directly count full builds, but we can infer the number. Full builds are triggered when:
- First build of the day: Gradle’s dependency resolution cache defaults to a 24-hour cycle, so there’s at least 1 full build per day
- Modifying shared modules: This forces nearly all modules to recompile. Git log easily reveals the frequency of shared code changes – roughly 0.2 times per person per day
Incremental build frequency
- Assuming at least one build before each commit, git log also gives us the incremental build count – roughly 10 per person per day
Using this algorithm with my own experience data:
- Average full build time per person = 1.2 builds/day * 20 min = 24 min/day
- Average incremental build time per person = 10 builds/day * 3 min = 30 min/day
- Average total build time per person = 54 min/day
That looked pretty serious. I asked a colleague to collect actual build performance data from development environments. After about two weeks, the conclusion was:
Average build time is about 3.5 minutes, and average daily time spent on builds is about 35 minutes per person. Doesn’t seem too bad.
What?! Why was this so different from my estimate?
Better Metrics
Based on the git log data, incremental builds are far more frequent than full builds. If you take the arithmetic mean, the extreme full-build values get completely averaged out by the incremental builds. So how do we find the real problem?
Forget the average!
What we should care about is “how much time each person spends on builds per day,” not “how long a single build takes.” So what’s wrong with the 35-minute-per-day average? It’s an arithmetic mean across everyone, and the differences between “people and machines” mean everyone’s situation varies. The arithmetic mean hides these differences. For engineers with good hardware, builds genuinely aren’t a problem. But there’s huge variation in machine specs – some people are still on Intel MacBook Pros due to onboarding timing, while others have M1s. Even M1s come in different core counts – 10-core, 12-core, etc. How do we surface these differences in the data?
Histogram
From the raw build performance data, group by username, sum per day, and you get each engineer’s daily build time. Then take the P90 of each engineer’s daily build time and create a histogram with 30-minute buckets:
The chart shows that nearly half of all engineers spend over 1 hour per day on builds, with some reaching as high as 4 hours. We also notice two points on the far right (x={20, 31}) that are outliers. How do we remove them?
Tail-Trimmed Histogram
In statistics, for data with “long tails” or extreme values, trimming and Winsorizing are common noise-removal techniques.
The reason for those two outliers at the histogram’s tail: the laptop lid was closed during a build, suspending the process. We can use trimming to remove them:
How exactly is the tail trimmed? The method I used: create a histogram from the raw build data in minute-based buckets:
Then use cumulative frequency to find the P99.8 bucket and truncate everything beyond it:
Remove the truncated build records (noise) from the original data, and you get the tail-trimmed histogram above.
The Real Problem
From the tail-trimmed histogram, the picture is clear:
- 14.29% of engineers spend at least 2 hours per day on builds
- 42.86% of engineers spend at least 1 hour per day on builds
This conclusion is far more consistent with my actual experience than “average daily build time is about 35 minutes.”
Reference
- Blog Link: https://johnsonlee.io/2024/01/27/use-better-metrics-for-build-performance-measurement.en/
- Copyright Declaration: 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
