We have a customer with a big installation. This customer also has several other contractors who do different things for them. Among other things, they have root access to a staging system to test out things there. Mistake number one, I guess.

This staging system is not really one. The customer is actually using one of the applications on it as a production system. Mistake number two. This application is the ticketing system for their entire helpdesk. Everybody involved in the project knows about this production application, so you’d think there should be no problem.

Well, part of the application is a database software which accesses raw partitions. Now an administrator from the contractor with root privileges wanted to install a different database software as a test for a different project. And ran out of space. What did the nice person do?

“Oh,” he said to himself. “Here’s some unmounted partitions.”

You may guess what happened next. He formatted the unmounted partitions (Mistake 3), installed his database on them (Mistake 4) and went home thinking, “Another Job Well Done.” (Mistake 5).

Now the original database software was still running, happily accessing what it considered its raw data partitions. Of course, the new software, its databases, and the filesystem underneath did not really enjoy that bits and pieces of it were slowly being overwritten.

Result? Two conflicting applications, both dead, both with completely corrupted data.

Of course, this being a staging system, there have not been any backups in over a year at the very least. Mistake 6.

There is a bright side to this mess. I am not involved, and I don’t have to tell the customer the bad new either.

Advertisements