In my previous post, I had lamented on what I classified as the sad state of software testing. My gripe was in the overly subjective approach we take when testing software. If we are then content with this approach, which admittedly is what I have observed throughout my career, then learning from the very best practitioners is a must. Neal Fachan is the very best I have encountered in my career.
Neal has 20+ years of experience building large scale distributed systems. He was a Distinguished Engineer at Isilon, worked on databases at AWS and most recently was co-founder and Chief Scientist at Qumulo. I met Neal at Qumulo and have witnessed the benefits of his brilliance and incredible engineering standards. I’ve worked on several distributed systems throughout my career, but none came close to the quality of Qumulo’s distributed file system. Neal played a huge role in ensuring that outcome.
I asked Neal a few questions on his testing principles and approach which are all shared in the sections below.
What is your approach to testing?
It's important to have a multi-layered test approach. I personally like the test pyramid analogy. At the bottom of the pyramid are all of your unit tests. You should have a lot of these, as each individual test should only cover one small piece of functionality, and you should have really high code coverage with unit tests.
The next layer is integration tests, where there are fewer tests but each test exercises a larger amount of the code base. Each new layer expands the the scope of the system under test, but reduces the number of tests.
I find that it's hard to generically label every layer of the pyramid since they depend a lot on product and its software architecture.
Sometimes a system will have one upper layer that is larger than the lower layers. You could say this inverts the testing pyramid, but I like to think of it more like an hour glass: It's still important to keep unit tests, even if there is a thorough suite of higher-level tests. A few examples of where I've seen larger upper test layers include file system protocol compatibility tests, and database query tests.
Finally, I should mention that there are two types of tests that I've grown to love. The first is randomized whole-module or whole-subsystem tests. These are especially useful for multi-threaded or distributed systems. The idea is to apply a random stream of actions to a system and check that the final state is correct -- and possibly intermediate states. These can be a larger time investment, as you also need to write an oracle: a simple model of the system that can also accept the random stream of actions and always give the correct expected answer. I used to undervalue this type of test, but over the years I've seen it catch countless bugs, and I'm now a fan.
The second type is what I call state exploration tests. Here you exhaustively test all of the possible orderings of inputs to a system and ensure the system transitions to the correct state and emits the correct outputs.
Does your approach change if you are testing legacy code?
Legacy code is definitely a challenge. Writing unit tests usually requires that the code be properly decoupled. Most people won't write code like that unless they have a driving function, like unit tests. So, to add unit tests, you're probably going to have to refactor the code as you go. However, that could introduce bugs: the code isn't well tested! The first thing to do, before changing any code, is to write thorough integration tests, including randomized tests if applicable.
How do you know if a code-base is well tested?
There are a few ways.
The first is observational: if the code base doesn't generate very many bugs over the course of a number of releases, then it's probably well tested (unless it's not under active development).
The second is challenged-based: can the team responsible for the code base comfortably make changes to it on a weekly or biweekly basis? If not, the code base probably isn't well tested.
Can one test too much & if so, what are the consequences?
Absolutely. It seems to be a common occurrence, honestly. The primary consequence of testing too much is that the code becomes hard to change. What should be an easy task becomes an arduous one. The secondary consequence is that people stop testing in general because they've been slowed down too many times in the past.
Should developers test their code or should that be done by QA engineers?
The developer should thoroughly test their code. QA engineers have a very important role to play, but it's not to catch crappy code that a developer throws over the wall. QA engineers should be experts in the product who do interesting end-to-end testing of the product. They should generally have resources available to them that developers don't (bigger system, longer testing cycles, etc.). Plus, they bring a completely different, black-box perspective to the code they're testing. Sometimes this leads them to use the system in ways the developer never imagined or intended.
I strongly believe that when QA engineers find bugs, software developers should seek to understand why their own tests didn't catch the bug, and update their testing methodology so that they catch similar bugs in the future.
When should manual and/or QA engineers test?
This totally depends on the company, it's release cycle cadence (continuous, weekly, monthly, yearly), and it's product. Generally, I think QA engineers should test after a "release" is done, but for CD, this means continuous testing. There are some types of software, UI comes to mind, where the manual testing should be done before check-in.
Testing tools, test fixtures, test automation runners and so on are for the most part idiosyncratic to companies/teams. There aren’t as standardized as other software engineering tools. Why do you think that is?
On the unit test side, people seem to have settled on one JUnit-inspired framework per language. But I do agree with you once you start looking at larger tests. We haven’t even standardized on what to call our higher level tests, let alone how to execute them. I think it’s just because these things take time.
We haven’t been doing large scale automated testing for very long as an industry.
Anywhere from 30-50% of development activities are allocated to testing, do you think there’s an opportunity to build a product/tool that can optimize this spend?
I think there are a few opportunities. First, as an industry I think we can invest more in tests like the randomized and state exploration tests mentioned above that exercise code in various ways and then verify the results. Property based testing is another great example of this. These types of tests are amenable to software automation.
For example, one can use various test doubles and an automated driver to explore every possible ordering of a set of messages, and to then verify that the code handles those orderings correctly.
Second, we can change our thinking around how compute intensive our tests can be. The amount of compute power out there today is mind boggling. We should look for ways to leverage this large amount of compute power to find bugs through testing, likely using techniques like those mentioned above. For example, having a single core continually running randomized tests in the code base will likely catch a lot of bugs before they go out to customers. The cost of that single core is trivial compared to the cost of shipping bugs to customers.
Hardware has formal verification tools and simulators, amongst others. Why do you think we don’t have anything similar in software? It seems the hardware folks “figured” this out and we are still alchemists.
First, there are tools like TLA+ that are moving us to a more formal model of verification. I think the big problem for us software people is that software is just so expressive. Analyzing even short snippets of code quickly becomes unfeasible.
Second, I think we’re starting to see some small moves to black box testing tools that still take a “verification approach”. Maybe you could call these “informal verification”. Property based testing is a great example of this. If your property based tests pass, you aren’t 100% sure that there isn’t a bug lurking, but you’re asymptotically close. Similarly, the state exploration tests I mentioned above are a step in this direction.
Favorite testing resources and books
I’m a big fan of xUnit Test Patterns: Refactoring Test Code by Gerard Meszaros. This book establishes a taxonomy and common language for all things unit testing. This is really important for us as an industry to have. If you can’t effectively communicate about problems using the same language, how can you expect to solve those problems? However, this book isn’t limited to establishing a common language. It provides a lot of great advice on how to improve your tests.
I also draw inspiration from all of the great testing libraries and frameworks out there. Of course there are all of the xUnit frameworks. It’s cool to see how each library adapts similar concepts to different languages. I also think property based testing is neat, and Hypothesis is a great implementation that is very inspirational.
Neal’s advice is straightforward. Most will probably read it and nod along. Doing what he espouses is very hard. Perhaps the hardest part is in cultivating a culture and engineering ethos that applies these simple principles and best practices continuously.
Software, both development and testing, is a mix of science with a healthy dose of art and subjectivity. Therefore a culture of quality and excellence has to permeate the entire R&D organization. I’ve been fortunate enough to see and benefit from this culture. It’s hard to build and maintain, but the dividends pay off in leaps and bounds.
Ironically, as Neal and I were going back and forth writing this post, this tweet landed
And my initial reaction to this was not the Twitter needs a complete rewrite per se, but that they lack sufficient testing to ensure changes to their code don’t result in “massive ramifications”
Interesting articles I am reading
If there ever was an article that highlights why startups can out execute large companies, this is it. An insider view of the inner workings (or not) of Google
Ever wonder how ChatGPT works. Stephen Wolfram has you covered in this most excellent article.
The data space is one I keenly follow. One particular area is the convergence of data warehouses (think Snowflake) with the lake-house approach espoused by the likes of Dremio, Databricks and Starburst.
And finally, if you or someone you know has been impacted by the recent tech layoffs, I encourage you to submit your resume here. I have partnered with Pallet and am curating job postings from their network and mine.