On what the industry’s favourite word conceals, and what it costs
Consider the word itself. Bug. What it sounds like. What it implies. Something that arrived uninvited — that crawled in from the outside while no one was watching, while the code sat there doing nothing wrong, before the infestation. The framing is built into the syllable, and the syllable has been doing quiet work for seventy years. Not neutral work. Exculpatory work.
No one who uses the word means harm by it. That is precisely the problem. The harm is in the usage, accumulated across ten thousand postmortems and a hundred million tickets, each one quietly rehearsing the same unexamined premise: that failure in software is something that happens to software, rather than something that is made.
In September 1947, a moth flew into Relay 70 of the Harvard Mark II computer and caused it to malfunction. Grace Hopper’s team removed the moth, taped it into the logbook, and wrote next to it: First actual case of bug being found. The entry is charming, self-aware, and clearly a joke — the team knew they were playing with a term that already existed in engineering slang. What no one anticipated was what would happen next: that this single, literal, biological event would become the founding metaphor for every subsequent failure in the history of software, regardless of cause, regardless of context, regardless of whether any actual insect was involved.
The origin story is not merely inaccurate as a description of what software errors are. It is inaccurate in a direction. It points away from causality. The moth was external. The moth was unpredictable. The moth was no one’s fault. Every time the word is used since, it carries a residue of that exculpation — the implication that the failure, like the moth, arrived from somewhere outside the system, outside the team, outside the chain of decisions that produced it.
That chain almost always exists. This is what the word covers.1
The literature of software engineering has known this for decades, even when it did not say so plainly.
Fred Brooks, in his landmark 1986 paper “No Silver Bullet,” drew a distinction that has since become foundational: between the essential difficulties of software — the inherent complexity of the problem being solved — and its accidental ones — the difficulties that arise from the way we build software, the tools we use, the processes we follow. Brooks was primarily arguing that no single technical breakthrough would eliminate the essential difficulties. But his framework implies something the industry has been slow to absorb: if the essential difficulty of software is real and irreducible, then everything else — including the defects we introduce — belongs to the accidental category. It is produced by our practices. It is, in principle, addressable.2
Four years later, Boris Beizer made the implications explicit in Software Testing Techniques (1990), a work that remains among the most rigorous treatments of software failure ever written. Beizer’s book opens with what he called a “bug taxonomy” — a four-level classification of software defects by type, origin, and mechanism. The taxonomy was important not merely as a reference tool, but as a philosophical stance. To classify defects is to insist that they are classifiable: that failures have kinds, that kinds have causes, that causes can be identified and prevented. The taxonomy does not merely organize bugs. It refuses to treat them as weather.3
Around the same time, Watts Humphrey at the Software Engineering Institute was developing what would become the Capability Maturity Model — a framework for assessing and improving the software development processes of organizations. The CMM’s five-level progression is telling precisely in how it describes its highest tier. Level 5, “Optimizing,” is defined not by the speed or sophistication of a team’s output, but by a specific posture toward failure: the organization proactively identifies the causes of defects and acts to prevent their recurrence. Defect prevention — not defect management, not defect tolerance, not the normalization of a defect rate — is the mark of the most mature engineering organization the CMM can describe. The model was commissioned by the United States Department of Defense. It was calibrated against the most demanding software environments the government could identify. Its apex is the organization that has stopped treating its failures as visitation.4
The grammatical difference between a bug and a defect is not decoration. It is the whole argument.
Bugs appear. They are found, caught, discovered — verbs that position the software team as investigators arriving after the fact, detectives at a scene they did not create. Defects are introduced. They are produced, written, shipped — verbs that position the team as agents in a causal chain that begins with a decision and ends with a failure. One grammar implies visitation. The other implies authorship.
This is not pedantry. Language at the level of daily habitual use does not stay decorative — it shapes the categories through which problems are seen, the questions that are asked and not asked, the distribution of attention in a postmortem. A team that has spent years speaking the grammar of visitation will not naturally ask, when something breaks, what decision made this possible and at what point was it knowable. It will ask, with the best of intentions, where did this come from — and the question itself smuggles in the assumption that it came from somewhere other than the work.
Defects have owners. Not in the sense of blame — the instinct to personalise failure is its own distortion, and not the one being argued for here. But in the sense of causality. A defect implies, structurally, that a standard existed, that the work deviated from it, and that the deviation has a cause which is, in principle, findable. It implies a chain. The chain goes somewhere. The somewhere is not the insect kingdom.
In 2002, the National Institute of Standards and Technology commissioned a study from the Research Triangle Institute on the economic cost of software failure in the United States. The figure it produced was startling: defects in software were costing the American economy an estimated $59.5 billion annually — roughly 0.6 percent of gross domestic product at the time. More than half of those costs were borne not by the developers who introduced the defects, but by the users who encountered them: in workarounds, in lost productivity, in systems that failed at the moments they were needed most.5
The NIST report used the word bugs throughout, as did the popular press coverage it received. But its analytical framework told a different story. The study was premised on the observation that defects have a point of introduction and a point of discovery, and that the further these two points are separated in time — the further a defect travels through the development cycle before it is found — the more expensive it becomes to address. The economic argument against bugs was, underneath its language, an economic argument for treating failures as having origins that are traceable, preventable, and costly in proportion to how late they are caught. This is not the grammar of weather. This is the grammar of manufacturing.6
Capers Jones spent decades building the empirical case for exactly this framing. In his 2012 survey of the state of software quality, Jones documented what he called “defect potentials” — the aggregate number of defects likely to be found across all phases of a software project, including requirements, architecture, design, code, documentation, and the secondary defects introduced during defect repair itself. What his data showed, across hundreds of organizations, was that defects introduced in requirements and design — the earliest and most tractable phases — were the most expensive to fix and the least likely to be measured at all. Most teams did not begin to measure defects until testing, making all upstream failures invisible to their own quality accounting. The vocabulary of bugs, which implies that defects emerge during or after coding, actively enables this invisibility.7
The disciplines that cannot afford comfortable language figured this out long ago.
Aviation safety does not speak of bugs. Neither does the pharmaceutical industry, nor nuclear engineering, nor the developers of software for surgical systems or flight control or railway switching. These are domains where the consequences of euphemism are too visible and too immediate to permit it. They adopted the language of defects, of nonconformances, of root-cause analysis — not because their practitioners are more rigorous by temperament, but because their professional cultures evolved under conditions that punished evasion. When a bridge falls, the word bug does not survive contact with the inquiry. The inquiry demands a cause. The cause demands a name.
Six Sigma and ISO quality frameworks, developed in manufacturing contexts where failure is measurable and consequential, built their entire architecture around the concept of the defect precisely because the concept forces the right questions. What is the defined standard? What is the measured deviation? What is the root cause of the deviation? What systemic condition permitted the root cause to operate? These are not exotic questions. They are the questions that every software postmortem should be asking and that the vocabulary of bugs quietly discourages.
The software industry has for decades told itself that it is different — that the complexity and pace of software development make manufacturing analogies inapplicable, that the rate of change is too high for formal quality frameworks, that some level of bugs is simply the natural condition of shipping software in a competitive environment. There is a kernel of truth in this. There is a much larger quantity of mythology. And the mythology is, again, directional: it flows toward exculpation, toward the normalisation of failure, toward the treatment of defects as weather rather than as output.8
Language at scale does not stay in its lane. This is the cultural consequence the industry has not fully registered.
“It’s just a bug” moves from the ticket to the standup to the postmortem to the product roadmap. Each iteration is a small permission: to be surprised by failure, to treat the current defect rate as a baseline rather than an indictment, to spend the postmortem on remediation rather than on the structural conditions that made the defect possible. The minimisation is rarely cynical. That is what makes it structural rather than individual: the word has so thoroughly colonised the thinking that the thinking no longer notices the word. The assumption is inherited wholesale, unnamed, and therefore unexaminable.
The downstream effects compound. A team that normalises bugs normalises the conditions that produce them. It builds a culture in which a certain density of failure is simply the texture of software — unfortunate, expected, and essentially natural. Regressions are weather. Outages are weather. The cumulative debt of shipped defects is weather. And weather, by definition, is not something anyone made.
This maps directly onto outcomes. Jones and Bonsignour’s comprehensive empirical study found that effective defect removal requires a layered approach: inspections, static analysis, and testing combined. Teams that relied on testing alone achieved defect removal rates of around 85 percent, meaning roughly one in seven defects reached production. Teams that added formal inspections and static analysis upstream pushed removal rates above 95 percent. The difference is not technical sophistication. It is the decision to treat failure as something produced — and therefore preventable upstream — rather than as something discovered. The vocabulary of bugs makes the upstream investment culturally harder to justify.9
The most significant recent development is that the argument has stopped being made only by academics and quality practitioners. It is now being made by national security agencies, in the language of sovereign risk.
In November 2022, the National Security Agency published a Cybersecurity Information Sheet on software memory safety. Its central finding was precise: memory safety vulnerabilities — defects introduced by the use of programming languages that give developers direct control over memory management — account for a large proportion of all exploitable software vulnerabilities. The NSA did not describe these as bugs that appear. It described them as a class of defect that is introduced by design choices made at the language level, and that can be systematically prevented by different design choices. The recommendation was structural: shift to memory-safe programming languages. Not patch harder. Not test more. Change the conditions that make the defect class possible.10
The following year, CISA — joined by the NSA, the FBI, and the national cybersecurity agencies of Australia, Canada, the United Kingdom, and New Zealand — published “The Case for Memory Safe Roadmaps,” extending the argument to all critical software. In February 2024, the White House Office of the National Cyber Director went further still, releasing a technical report titled “Back to the Building Blocks: A Path Toward Secure and Measurable Software.” The report argued explicitly that memory safety vulnerabilities are not inevitable features of complex systems. They are predictable consequences of building software on unsafe foundations, and they connect the most damaging cyberattacks of the past three decades — from the Morris worm of 1988 to the Heartbleed vulnerability of 2014 — to a single, correctable class of decision.11
The significance of this shift should not be understated. The framing of defects as a national security problem is not merely alarming — it is clarifying. It makes visible something the vocabulary of bugs had obscured: that the choices made in software development are not private technical decisions whose consequences stay inside the code. They propagate outward. They become infrastructure. And when infrastructure fails in predictable, preventable ways, the word bug is not merely inadequate. It is a kind of false testimony.
There is an objection that deserves a direct answer, because it will be made. It runs as follows: the word bug is so thoroughly embedded in the culture, so universally understood, so obviously harmless in daily use, that the case against it is a kind of professional pedantry — the sort of argument that wins on paper and changes nothing in practice. The word means what everyone knows it means. The concept of accountability is available regardless of which word is used. Why does the label matter?
The objection underestimates what labels do. The question is not whether sophisticated practitioners can hold the concept of causality in mind while using a word that implies its absence. Some can, clearly. The question is what the word does at scale, in culture, over time, and especially at the margins — in the postmortem that ends thirty minutes early, in the ticket that gets closed as “cannot reproduce,” in the estimate that builds in a bug buffer rather than asking what processes would make the buffer unnecessary. At those margins, which are where culture actually lives, the word matters. The word is the thought. The thought is the practice.
The history of the word’s alternatives supports this. Beizer’s defect taxonomy was not merely an academic exercise — it was an attempt to make failure legible by giving it structure. The CMM’s vocabulary of defect prevention was not window dressing — it was a deliberate signal that the most mature organizations do not merely manage failure but treat it as a process to be improved away. The NSA’s shift to the language of “classes of vulnerabilities” and “preventable defects” is not rhetorical — it is a choice to make visible the causal chain that the word bug severs. In each case, the people who changed the language did so because they understood that language is where assumptions become invisible, and that invisible assumptions are the ones that survive every explicit intervention unchanged.12
To adopt the word defect is to take a position. Not a comfortable one. The discomfort is the point.
It is to say: this failure was produced. It has a cause. The cause operated inside a system of decisions and conditions that we are responsible for maintaining. The failure was, at some point upstream, preventable — not necessarily by the person who introduced it, but by the system that permitted it to be introduced and reach production undetected. The appropriate response is not remediation alone, but inquiry into the conditions that made the defect possible, followed by changes to those conditions. This is harder than closing the ticket. It is harder than the postmortem that identifies what happened without asking why the system allowed it. It is harder than the retrospective that agrees on action items and then watches them quietly expire.
A version of this argument goes too far, and it is worth naming in order to set it aside. The claim is not that every defect is the result of carelessness, or that individuals who introduce defects are culpable in any simple personal sense, or that the appropriate response to a production failure is to identify who wrote the offending line and proceed accordingly. That version of the argument is counterproductive: it drives defects underground, discourages the honest reporting of near-misses, and produces the kind of blame culture that makes systemic improvement impossible. Defect language is not blame language. It is causality language. Blame asks who is responsible in the sense of who should feel bad. Causality asks what conditions produced this outcome, and what would have to change for a different outcome to be possible. The second question is the useful one.
Words are not neutral. They carry assumptions about causality, about agency, about what is normal and what is not. Bug gave the software industry a seventy-year permission structure — to be surprised by failure, to treat it as visitation rather than production, to reach for the word that arrives without a fingerprint on it. The permission was never formally granted. It was built into the language and inherited by everyone who learned to write code in a culture that already used it.
The literature had the alternative ready by 1990. The engineering frameworks had operationalised it by 1991. The economic data had quantified what the alternative would save by 2002. The national security apparatus has made the case in terms of sovereign risk since 2022. At each stage, the industry acknowledged the argument and returned to its word. There is no mystery in this: changing the word means accepting what the word was hiding. It means giving up the moth.
Defect removes the permission. It asks — and this is precisely why the instinct will be to resist it — who introduced this, when, under what conditions, and what would have had to be different for it not to have happened. It asks for a cause. It assumes a chain. It refuses, grammatically, the comfort of the moth.
That refusal is not the end of the work. It is where the work begins.
1The logbook entry is real and is held by the Smithsonian National Museum of American History. The moth is still taped to the page. What the historical record does not support is the common claim that Hopper coined the term — engineering slang for hardware faults predates 1947 by decades, traceable at least to Thomas Edison’s usage in the 1870s. Hopper’s team was making a joke that depended on an existing word. The joke was good. The term’s subsequent migration from hardware to software failure, where no physical insect is ever involved, is the conceptual error this essay is examining.
2Brooks, F.P. (1986/1987). “No Silver Bullet: Essence and Accidents of Software Engineering.” Computer, Vol. 20, No. 4, pp. 10–19. First presented at the IFIP Tenth World Computing Conference in 1986 and published in expanded form in Computer in April 1987. The distinction between essential and accidental difficulty is Aristotelian in origin; Brooks applies it to argue that most productivity gains to date have addressed accidental complexity, while essential complexity — inherent in the problem itself — cannot be engineered away. The implication for defect language is direct: if accidental difficulties are, by definition, those we impose on ourselves, then defects introduced by our own practices belong squarely in that category. They are not essential features of software. They are consequences of the way we build it. Brooks’s paper is often read as grounds for pessimism (no silver bullet); it is more usefully read as grounds for precision about which category of problem we are actually addressing.
3Beizer, B. (1990). Software Testing Techniques, 2nd ed. Van Nostrand Reinhold. Beizer’s taxonomy defines defects across four levels of granularity, distinguishing by type (requirements, design, code, documentation), by origin, and by mechanism. The philosophical move is important: the taxonomy does not merely classify defects after the fact. It insists that defects are classifiable — that failures belong to recognizable types with identifiable causes. A team that can say “this is a requirements defect introduced during specification” is in an entirely different epistemic position than one that says “a bug appeared in production.” The former has a chain. The latter has a weather report. Beizer’s taxonomy was one of the first systematic attempts in the software literature to make that chain visible and navigable.
4Paulk, M.C., Curtis, B., Chrissis, M.B., and Weber, C.V. (1993). Capability Maturity Model for Software, Version 1.1. CMU/SEI-93-TR-24. Software Engineering Institute, Carnegie Mellon University. The CMM was initiated in 1986 at the request of the U.S. Air Force, which needed an objective method for evaluating the software development capability of defense contractors. Watts Humphrey, who had spent 27 years at IBM developing process maturity concepts, joined the SEI that year and shaped the model’s foundational architecture. The five-level progression — Initial, Repeatable, Defined, Managed, Optimizing — is calibrated to the relationship between process discipline and failure rate. A Level 1 organization is characterized by ad hoc processes; success depends on individual heroics. A Level 5 organization has internalized the feedback loop between defect data and process change. The use of “Defect Prevention” as the explicit label for a Level 5 key process area — rather than any output or performance metric — makes clear that the model’s architects understood defect language as more than a terminological preference. It is a description of organizational maturity.
5Research Triangle Institute (2002). The Economic Impacts of Inadequate Infrastructure for Software Testing. NIST Planning Report 02-3. National Institute of Standards and Technology, May 2002. The $59.5 billion estimate was generated from surveys across automotive and aerospace manufacturing and financial services sectors, and is best understood as a floor rather than a ceiling: it addressed only the direct economic cost of testing failures in those sectors, not the broader costs of software nonperformance across the entire economy. The study’s framing is also instructive: it addresses the cost of “inadequate testing infrastructure” rather than the cost of defects per se. Defects introduced in requirements and design phases — which no amount of downstream testing can address — fall outside the study’s scope. The real economic cost of software defects is substantially higher than the figure the NIST report produced.
6The NIST report’s finding that defects cost more the later they are discovered — a finding consistent with decades of prior industry research — has been widely summarized as “it costs 100 times as much to fix a defect after release as during early development.” Jones (2012) notes that while this specific ratio is often cited, the underlying pattern is robust across different measurement methodologies and organizational contexts. The ratio varies by defect type, by domain, and by the maturity of the organization’s detection processes; but the directional claim — that late detection is dramatically more expensive than early prevention — is one of the most consistent findings in the empirical software engineering literature. Its implication for the vocabulary of bugs is direct: if defects have a point of introduction that precedes their point of discovery, then treating defects as things that “appear” rather than things that are “introduced” is not merely a linguistic preference. It is a choice to ignore the most actionable part of the failure’s history.
7Jones, C. (2012). “Software Quality in 2012: A Survey of the State of the Art.” Capers Jones & Associates LLC. See also: Jones, C. and Bonsignour, O. (2011). The Economics of Software Quality. Addison-Wesley; and Jones, C. (2013). “Function Points as a Universal Software Metric.” ACM SIGSOFT Software Engineering Notes, 38(4), 1–27. Jones’s concept of “defect potential” — the total number of defects expected across all artifact types, not just source code — is an important corrective to the dominant practice of measuring defects only in code and only during testing. His data consistently show that requirements defects outnumber code defects in large systems, and that they are systematically invisible to organizations whose measurement practices begin at the coding phase. The word “bug” reinforces this invisibility: its implicit association with code rather than with requirements, design, or documentation actively discourages the upstream measurement that would make the full defect picture visible.
8The argument that software is categorically exempt from manufacturing quality frameworks because of its unique complexity has a long history and is not entirely without merit: the malleability of software, the speed of iteration, and the difficulty of formal specification all introduce genuine complications that manufacturing analogies do not fully capture. What is not justified is the inference that these complications make defect prevention impossible or not worth attempting. The most rigorous counterexamples come from safety-critical software — avionics, medical devices, nuclear plant control — where defect densities an order of magnitude below commercial norms are routinely achieved through formal specification, rigorous review, and a cultural posture that treats every defect as a process failure. The interesting question is not whether this is possible; the evidence is clear that it is. The interesting question is why the rest of the industry has so consistently chosen otherwise — and whether the answer is partly found in the word it uses to describe what it is choosing to tolerate.
9Jones and Bonsignour (2011), op. cit. The finding — that combined inspection, static analysis, and testing approaches achieve defect removal rates above 95 percent compared to roughly 85 percent for testing alone — is drawn from empirical data across hundreds of organizations. The gap sounds modest in percentage terms; it is not modest in consequence. At the scale of a large software system with thousands of defects in its full potential, the difference between 85 and 95 percent removal efficiency is the difference between dozens and hundreds of defects reaching production. The additional insight — that pre-test defect removal also shortens test cycles and reduces testing costs, making the upstream investment economically rational as well as ethically preferable — is precisely what the vocabulary of bugs makes invisible, by directing attention entirely to what is found during testing.
10National Security Agency (2022, updated April 2023). “Software Memory Safety.” Cybersecurity Information Sheet, U/OO/219936-22. The sheet notes that memory safety vulnerabilities — buffer overflows, use-after-free errors, and related defect classes — arise not from individual programmer carelessness but from the structural properties of programming languages that give developers direct access to memory management without enforcing safety checks. The NSA’s recommendation to shift to memory-safe languages is an argument that the defect class is systemic, not individual: it is produced by a category of design decision, and it can be eliminated by a different category of design decision. This is as clear a statement as exists in the public record that entire categories of software defects are manufactured, not found.
11CISA, NSA, FBI, ASD’s ACSC, CCCS, NCSC-UK, NCSC-NZ, and CERT-NZ (2023). The Case for Memory Safe Roadmaps. Joint Cybersecurity Information Sheet, December 2023. White House Office of the National Cyber Director (2024). Back to the Building Blocks: A Path Toward Secure and Measurable Software. February 26, 2024. The ONCD report’s historical framing — connecting memory safety vulnerabilities to specific attacks from 1988 through 2023 — makes the temporal argument that this is not a new problem but a persistent, traceable class of defect whose consequences have been visible for thirty-five years. The report notes explicitly that the goal is to “eliminate entire classes of bugs” by addressing the conditions that produce them — a formulation that, even in its choice to retain the word bug, describes the activity of defect prevention rather than the activity of defect discovery.
12The pattern of vocabulary change enabling cultural change in engineering disciplines is worth attending to directly. In aviation, the shift from “pilot error” to “human factors” in the 1970s and 1980s was not merely semantic: it enabled a systemic account of failure that produced cockpit redesign, checklist protocols, crew resource management training, and a sustained reduction in accidents. The word “error” implied individual failure and pointed toward individual remediation. “Human factors” implied systemic conditions that produced predictable failure modes and pointed toward systemic change. In medicine, the shift from “complications” to “preventable adverse events” — accelerated by the Institute of Medicine’s 1999 report To Err Is Human — similarly unlocked a systemic account that individual-failure language had suppressed. In both cases, the vocabulary change preceded and enabled the organizational change. The lesson is not that words are sufficient. It is that words are necessary — and that the wrong word can make the right question unaskable.
Beizer, B. (1990). Software Testing Techniques, 2nd ed. Van Nostrand Reinhold.
Brooks, F.P. (1987). “No Silver Bullet: Essence and Accidents of Software Engineering.” Computer, 20(4), 10–19.
CISA, NSA, FBI et al. (2023). The Case for Memory Safe Roadmaps. Joint Cybersecurity Information Sheet.
Institute of Medicine (1999). To Err Is Human: Building a Safer Health System. National Academies Press.
Jones, C. (2012). “Software Quality in 2012: A Survey of the State of the Art.” Capers Jones & Associates LLC.
Jones, C. (2013). “Function Points as a Universal Software Metric.” ACM SIGSOFT Software Engineering Notes, 38(4), 1–27.
Jones, C. and Bonsignour, O. (2011). The Economics of Software Quality. Addison-Wesley.
National Security Agency (2022, rev. 2023). Software Memory Safety. Cybersecurity Information Sheet, U/OO/219936-22.
Office of the National Cyber Director (2024). Back to the Building Blocks: A Path Toward Secure and Measurable Software. The White House, February 26, 2024.
Paulk, M.C., Curtis, B., Chrissis, M.B., and Weber, C.V. (1993). Capability Maturity Model for Software, Version 1.1. CMU/SEI-93-TR-24. Software Engineering Institute, Carnegie Mellon University.
Research Triangle Institute (2002). The Economic Impacts of Inadequate Infrastructure for Software Testing. NIST Planning Report 02-3.