As an engineer I was trained to design things to "work." Working implies tolerating occasional failures. The engineer's job is to balance performance with cost. If the cost of avoiding failure is $100,000 while the probability of failure is once in 20 years and the consequence of failure is only $10,000, being "good enough" makes sound engineering and economic sense. In the last 20 years, however, my clients have refocused me on "never failing" because some failures are so critical they directly impact the bottom line and the very future of organizations.

I have personally found the transition from an attitude of "mostly working" to "never failing" to be very intellectually challenging. Too often the critical importance of a "never failing" attitude only becomes obvious to corporate boards and senior executives after a critical failure has occurred and earnings and stock prices have taken a hit. Not only is the corporate heart attack victim already at risk for not surviving, the corporate checkbook is wide open to attempt recovery while also doing what should have been done in the first place. Just as important, senior executive and board time will be diverted for months or even years dealing with governmental investigation and rebuilding public trust for their brand.

Very few business issues deserve a "never failing" approach. But if rain could affect your business, and it rains and rains and rains and your mission-critical facility is in the 100-year flood plain, you had better have the following: quantities of wading boots, rain coats, umbrellas and tarps; sandbags, plywood, sand and other construction materials and vehicles for preventing roof or perimeter leaks; pumps, fuel, small and large boats; people trained to respond; and spare parts, food and supplies for everything that you ever could conceivably need.

  • 25 photos reveal the magnitude of the worst spill in U.S. history

BP's management is currently experiencing the consequences of what appears to have been a "mostly working" approach to offshore oil drilling. The public's impression of BP as being unprepared for the environmental consequences of its drilling is rapidly becoming indelible. It won't matter that the rig belonged to Transocean or the well cementing was being done by Halliburton. BP will get the blame. Just weeks before the spill, President Obama sent a proposal to Congress for increased offshore drilling. As a result industry prospects looked the best since the Exxon Valdez spill in Alaska 21 years earlier. Obama has now suspended his oil-drilling plan amid a widespread loss of political support. Congress will now intervene (and other governments will follow). The market capitalization for BP and the entire deepwater oil industry is now significantly, and perhaps permanently, reduced.

The job of senior executives (or politicians and regulators) is to think the unthinkable. While few risks truly justify a "never failing" attitude, those that do should follow my five reliability principles:

  1. Multiple things must line up before failure can occur (catastrophic failures are extremely rare).
  2. Junior management error is the most frequent root cause. Why protect against something that probably won't happen?
  3. Very carefully control configuration changes. In BP's case the drilling rig was being disconnected at the time of explosion.
  4. Look for unintended interactions between adjacent systems. For instance, unexpected freezing conditions prevented the first BP well cap from working.
  5. Be very, very careful toward the very end of long-term projects. On the day of the BP explosion plaques were being distributed to employees for seven years of uninterrupted safety.

Kenneth G. Brill is founder of Uptime Institute.