6 Comments

I look forward to the interview with that Neal guy!

I think getting testing right is a fundamentally hard problem with lots of trade-offs, which requires judgment, which is one of the key reasons why senior engineers are paid so well.

You could spend unlimited time polishing unit tests to check every edge case (and some simplistic XP practictioners would be thrilled) but then the slightest interface change requires reworking all those tests. Conversely you could rely only on manual integration tests so that every change requires an expensive custom testing pass and your build-deploy cycle grinds to a halt.

On a different axis you could write extremely ugly chaos-style integration tests with lots of layer violations to make sure your code does the right thing in response to various failures, and it might be 10X easier and more effective to just have a TLA+ spec to verify high-level correctness and use a model-checker to find weird bugs.

Making testing good and cheap requires very experienced engineers who know how to design and modularize the system for good testing -- if that was easy to automate, then in some sense the project itself would be amenable to automation (and you could save yourself the highly paid engineers).

I think 3 critical principles are (a) you should have unit tests with fairly high code coverage, (b) you should have easy-to-run smoke tests of the entire system that validate basic use cases to make sure all the parts are integrated well, and (c) tests should be run regularly and failing tests should be taken seriously. In my experience it's a pretty bad antipattern not to do those 3 things.

Expand full comment
author

Me too re: Neal. Trying to get it done this week, which will be a challenge given our schedules.

Am not disagreeing at all with your principles. My issue/gripe is the following: We allocate a substantial amount of time and energy towards testing. Yet, quantifying the benefits of our efforts remains an illusion. We can measure efforts "hey, look I have smoke tests, unit tests, stress tests", but have no way to definitively answers questions like "will this about to be committed code bork a customer?", "is this code well tested? Should we spend 2x the effort on testing it, how/where..etc" The problem isn't on the effort side of the equation, it's on our inability to measure benefits and hence can't optimize/align efforts to achieve outcomes % vanity metrics like "increase code coverage to x%"

Expand full comment

Well you can use simple metrics to measure the principles (code coverage, percentage of test failures, CI pipeline turnaround time) though aiming for perfection on those metrics has severely diminishing returns.

But I do claim that you can't measure "how well do these tests achieve my business goals" any more than you can measure "is this architecture good" or "is this principal engineer good". I'm not saying those are impossible things to measure but they're sort of an equivalence class and there's as much art as science (which is why judgment is required etc.)

Even if all you want to know is "will this block of code terminate" you can't write a general-purpose test to determine that (see Turing, 1936)! So when you get into fuzzier stuff like "will those particular change have a bug" you're definitely in the territory of impossibility results in the most general sense.

Expand full comment
author

I agree that "general purpose test to determine that" is beyond the realm of possibilities. But between this end and our current approach lies "something" which ought to be better. I have witnessed enough cases in my career of the following scenario: we have unit tests, we have smoke tests, we have end to end test, lots of tests, we have QA -> and then watch regression after regression. Maybe that's just my experience, but the effort (and volume) do not yield the expected benefits. We get comfort from volume and check-marks vs trying to prove (admittedly this is impossible).

Expand full comment

Great post as always! I guess on measuring the efficacy of testing as a whole, it's sort of hard to know besides maybe lack of incidents or reduced latency?

A lot of tooling works in a testing environment and is measured at the time of testing. For example, developers using AtomicJar can test in a containerized environment, have a fair amount of certainty that it works and then deploy. Of course, there can always be issues but in and of itself, AtomicJar's success sort of validates the test right? Curious on your thoughts on what you'd look for in measurement across all tests.

Expand full comment
author

Thanks! The point I am (poorly) attempting to make is not that we lack tools and effort but that none of these help optimize for outcomes. See the Kevin thread for more on that :)

Expand full comment