We all may like to think our software is infallible. Sure, there may be bugs that pop up from time to time… but those bugs won’t do real damage. They can’t, let’s say, bring down every single flight in the United States. Right?
Well… maybe that can’t happen with your software. But it did happen with the FAA this week.
In a pretty unprecedented move, the FAA was forced to ground all domestic flights due to a “glitch” disabling their NOTAM (Notice to Air Missions) program. This program is critical for pilots to be aware of potential safety issues while flying.
Eventually, they fixed… the glitch (Office Space reference, anyone), but the domino effect of the systems being down ended up causing delays and cancellations throughout the entire day.
Obviously, it’s an embarrassing moment for the FAA. Supposedly, the cause of the fiasco was a single engineer replacing one file and bringing the entire system down. But it begs the question…
Why and how could a single engineer’s action take down an entire system like this?
I’m sure that the FAA is diving deep into this. But it’s probably a good reminder for all of us to really go through our code and systems. Could one mistake by an engineer bring down your entire product? If so… it may be time to come up with some new systems.