Written by Laurent Bossavit Thursday, 15 September 2011 00:00
Here another great essay from our short-list of entries to our essay competition. The winners will be announced tomorrow (16 September 2011). Here Laurent Bossavit essay "The Aloofness of Reality".
Do you think of yourself as an engineer? If you do, what does that mean to you?
My area of expertise is software development. You may or may not think of yourself as a "software engineer" - even if you don't, as a reader of this essay, it's very likely that you work with people who do think of themselves as engineers, so that question may be worth pondering for a moment.
Many people have no trouble thinking of themselves as software professionals. In lieu of a definition we can remember the quip, "a professional programmer is an amateur who never quit", by analogy with writers. But there is a growing sense of unease, among at least a subset of these professionals, with the denomination "software engineer". Witness - whether you approve of them or not - the popularity of "software craftsmanship" movements. (Or more generally of the Agile movement - of which, fear not, I will not be speaking much in this article.)
Academic research and software development
Assume for the sake of argument that software *is* an engineering discipline. If you asked around for what this mean to people, you would probably get an answer that included "based on science". Drilling down into that, we would encounter ideas such as "empirical evidence", and "validated models", and "quantitative reasoning".
Would a self-described craftsman argue that we should ignore science, and evidence, and models and numbers? I expect not.
And so, perhaps rather than "engineering" vs "craft", the more interesting debate is around what role we expect research and experimentation to play for our profession.
Wherever I go, people tend to tell me that they feel that "academic" research into software engineering is largely irrelevant to their work. Greg Wilson, editor of the 2010 book *Making Software* which rounded up some of the best empirical research in the field, later asked [1] :
When is the last time you read something in an ACM or IEEE journal that changed how you program or the tools you use? Ever?
"Never", answered Jorge Aranda, who added: "[for researchers], this should be something to be ashamed of." Wilson and Aranda went on to start a very interesting blog [2], "It will never work in theory", dedicated to briding this gap between practice and theory. This is a wonderful effort, and I can't recommend it enough. My concern is that the only people likely to read (and more importantly, comment on) that blog are people already convinced.
We still need to gnaw this bone a little more to get at its marrow: why should we *care* at all about research, especially experimental results?
The perils of empirical research
My colleague "Uncle" Bob Martin, when he gives talks or lectures on topics such as "clean code" or "Test-Driven Development", makes much of the story of Semmelweiss. In short, Ignaz Semmelweiss was a physician, who, in the 19th century when germ theory was still unknown, noticed the higher rates of women dying in childbirth in certain wards, and made the then-unintuitive connection with the fact that doctors in the same hospitals would often go straight from the dissecting room, where they had been dissecting corpses in the name of science, to "assisting" these women in delivery.
What's arresting and important about this story is that Semmelweiss, despite having shown conclusive experimental evidence for the effectiveness of hand-washing in reducing mothers' deaths, faced extreme resistance to his ideas from the scientific establishment of the time, to the point where he basically went into depression from which he never recovered, dying in a mental institution before reaching his fifties. It took two more decades (and many avoidable deaths) before the practice of hand-washing before patient contact became mandatory for physicians.
This is a disturbing and fascinating story, and I understand why Bob likes to tell it. Bob argues that software professionals today failing to at least *learn* and *try* test-driven development are showing the same kind of disregard for the very real and very painful problems of software development (low quality, buggy software) that Semmelweiss' comtemporaries showed to the suffering and deaths of mothers - only because there wasn't yet a fully worked out theory of infectious disease via germs.
There are, however, several problems with this parallel, not the least of which that the data on test-driven doesn't quite measure up to that on hand-washing: "a survey of all of the studies that have been done on TDD have shown that the better the study done, the weaker the signal as to its benefit" (Greg Wilson, commenting on the TDD chapter in "Making Software".) And quite clearly we know a lot more today about scientific research, and in particular about experimental design and statistical validation, than we did in the 19th century.
Can so many researchers be wrong? Is TDD, then, only a mirage? Well... It's not quite that simple.
Discipline envy
A lot of research in software engineering strikes me as hopelessly naive in one of two ways. Most of it fails entirely to account for the social and belief aspects altogether. It looks at its object of inquiry as if it was entirely material and inert; as if "software" was some kind of naturally occurring substance, the properties of which can be revealed in the equivalent of a test tube.
The more interesting but in some ways more distressing part of software engineering research borrows its experimental design approach from medicine and calls itself "evidence-based". Too often though this seems to be a matter of "discipline envy" and scientific status games, of who can use the most impressive-sounding statistical method to analyze data that turns out - on closer inspection - to be useless as empirical evidence, because of some gross conceptual or methodological flaw.
Take a recent study [4] "Comparing the Defect Reduction Benefits of Code Inspection and Test-Driven Development".
There is one thing that needs to be said first, before we get on to the science, and even if this seems like a relatively trivial complaint: Itís really a shame that most readers will have to pay $19 to get the PDF with the full text of the study, because the abstract *really* isn't the whole story. (Everyone should read George Monbiot's recent piece [5] on the economics of academic publishing.)
The paper "is a quasi-experiment comparing the software defect rates and implementation costs of two methods of software defect reduction: code inspection and test-driven development". The main claim is that the Inspection group turned in code that had fewer defects than the TDD group, and the authors claim, at the p=0.05 level of statistical significance which is the accepted norm (in medical resarch, for instance), that this is a reliable result.
But wait! That's only the abstract. If you go to the trouble of reading the whole paper, you learn that this is only true, statistically speaking, for "adjusted" defect counts. When the authors look at "unadjusted" defect counts there is no statistically significant difference: based on the unadjusted defect counts, we would reject hypotheses H1 and H2". (The usual problem in such studies that Bill Curtis pointed out [7] all these decades ago: noise swamps out signal.)
What does "adjusted" mean? It means basically that the Inspection students get credited not only for the bugs they fix during the one-week period after developing their code; they also get credited for all the bugs *they didn't fix*, that were found during inspection. This effectively stacks the deck in favor of Inspection over TDD, and it ís easy to suppose that this entirely accounts for the supposedly statistically significant difference between the two groups.
(The authors justify this procedure on the grounds that a well-run Inspection process would keep inspecting and fixing until all bugs found in the first Inspection round were in fact fixed. But that doesn't change the fact that the Inspection group effectively gets to do a lot more testing than the TDD group; it's not so surprising that this results in fewer defects.)
So what's going on here, really?
Science and reality
The Inspection vs TDD study is burdened with further flaws: the participants in the TDD group were given barely more than one hour of TDD training, for instance, on top of the usual problems with studies which look at "convenience samples" - that is, graduate students enrolled in the researchers' course. But really, the problem here is that the whole study is just too vulnerable to all kinds of biases, both the researchers' and the participants'.
Consider Philip K. Dick's idea, "Reality is that which, when you stop believing in it, doesn't go away." How does that strike you? Obviously true? Uncontroversial? To me this captures one essential aspect of experimental science - reality responds in certain ways when you prod it, whether you believe in what you're seeing or not. "Fact" is the name we give to this stubbornness of reality, its refusal to be persuaded by what we prefer to believe. So it goes for matter and gravity, right up to the strange and counterintuitive properties of light and tiny particles.
But then comes the niggling thought that some aspects of reality "go away when you stop believing in them", or more broadly are significantly affected by how much and in what ways we believe in them, countering naive reductionism. Some of the examples that come to mind are romantic love, social status, money, fashion or art. You can still get at facts about these things, but by a much more tortuous route than you get facts about the laws of physics.
Remember the TDD/Semmelweiss connection? The bigger issue there is that while germs are very much in the "don't go away if you stop believing in them" category, that's much less true of these things we call "bugs". (See this old article of mine [3] for instance.) The "better" studies of TDD mentioned above show some of the naivete I believe is one of software engineering's deeper problems; the "anecdotal" reports on which TDD enthusiasts base their recommendations may not pass muster as "proper" research but may well get at more useful insights than the academic research.
The thing is, more aspects of reality than we'd like to believe belong in the latter category. The so-called "placebo effect" is a well-known illustration, and there are many subtleties to designing experiments that take these effects into account. For instance in medical research on psychoactive drugs[6]: "many antidepressant trials have serious methodological weaknesses, including the unblinding of raters due to the common side effects of these drugs compared with the inert sugar pill".
Where to go from here
In fact, researcher John Ioannidis garnered much attention a few years back with an admittedly provocative headline [8]: "Why Most Published Research Findings Are False". Far from being a shining example to look up to, medical research turns out to have its own deep-seated problems!
I am not making this point to dissuade anyone from taking an interest in software engineering research - far from it. Whether you think of yourself as a software engineer, a craftsman, or a "code monkey", I think you are making a twofold mistake if you dismiss the work of academic researchers as irrelevant.
First, you are failing to develop yourself as a professional; you are missing out on some insights that would be useful to you, but more importantly you are failing to engage with some important issues; you are delaying progress not just for yourself but for your professional community as a whole.
Second, you are taking the risk of being blindsided. Right or wrong, changing scientific consensus will eventually have *some* impact, small or large, on the way you work. To turn a blind eye to where this consensus is going is to forfeit some of your right to take part in the conversation about how you work.
As practitioners, it is both in our interest and within our responsibility to pay attention to research. This includes not just the findings of such research, but also its processes and its institutions. Read research papers; find out what's happening in that world and why it's not more relevant to your work; weigh in; make your voice heard.
It is becoming increasingly clear that we must improve the quality of the conversation between researchers and practitioners. To make research more relevant, we must find new models and new methods, more appropriate to locating and then confirming hypotheses about software development, and we must make this a joint effort, with both practitioners and academics involved. And to do all that, we may need to abandon outdated paradigms - perhaps even move on from the "software engineering" label.
That is the conversation I'm looking forward to.
If that's of interest to you as well: let's talk!
Reference
[1] http://catenary.wordpress.com/page/2/
[2] http://www.neverworkintheory.org/?p=6
[3] http://www.ayeconference.com/entomology/
[4] http://www.computer.org/portal/web/csdl/doi/10.1109/TSE.2011.46
[5] http://www.monbiot.com/2011/08/29/the-lairds-of-learning/
[6] http://www.srmhp.org/0201/media-watch.html
[7] "Substantiating Programmer Variability," Proceedings of the IEEE, vol. 69, no. 7, 1981 - finding the *full* text of the article, rather than just the abstract, is left as an exercise for the reader; the important point made by Curtis is made in the main text, while the abstract has been (in my opinion) horribly misinterpreted.
[8] http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124
Bio
After a first career as a software developer (20 years of coding experience) and a few years as an independent consultant, Laurent Bossavit now heads Institut Agile, a France-based independent organization funded by a consortium of companies interested in Agile approaches for project management and software development. The Institute promotes research and formalization of knowledge on Agile and the development of of better client-supplier relationships in this area. Passionate about helping people in various Agile communities network and support each other, Laurent is also a member of the board of the Agile Alliance and of the French Agile association. He was a recipient of the 2006 Gordon Pask award for contributions to Agile practice and a co-founder of the Coding Dojos.
| < Prev | Next > |
|---|
Comments
RSS feed for comments to this post