Catastrophe! Pt 1

Catastrophic events don’t happen very often with safeguards in place to mitigate against them. But even if the odds are always approaching zero, there is the very real chance that every safeguard will fail, or that a situation arises you didn’t plan for.

 In my business, which is a bizarre combination of hardware, software, systems and user support, I need to be ready for just about anything. There are software packages out there that are incredibly complex. These applications usually come with a great support system already in place. I don’t even try any more to learn how most of them work. Nowadays all I really have to do is to back them up…

The Crash

There’s a server. It’s located in a big room somewhere, with dedicated APS and all the accoutrements. Recently there was a blackout in the building. Although I wasn’t there, I assume the APS did its job, and that the server shut down gracefully. When I arrived to restart the machine, everything functioned fine up to the login screen. I typed in the password, hit enter, and the desktop began to load. And then, without a warning, the Blue Screen of Death.

I don’t mean the dark, heavy blue one with all those big ugly letters. I mean the really pretty light blue one, with the perfectly sized, professional little message telling you that everything basically just went to hell, and to go and contact yourself (the administrator) if the problem persists. There’s only one message I wouldn’t want to see more right then, and that’s the Black Screen of Doom.

So I do a shutdown and a clean boot from nothing. And the controller reports the horrible news. No available devices present. The catastrophe had occurred, but I didn’t know it yet. What I did know was that this machine was going to the shop with me. At the least it probably needed a new hard drive, and I was better equipped to do that and all the software rebuild than the customer.

Just Another (Sun)day at the Office

I took this as an opportunity to change some things I had never liked on that machine. The only reason that SCSI drive was in there was to be striped in a “RAID” array. My opinion is this: without at least 2 drives, RAID is useless. So I replaced it with a fast SATA drive and reinstalled the OS and basic programs. Several hours later, after getting all the updates and extras from Microsoft installed, I went through and cleaned out all the leftover temporary files, service pack uninstall files, you know, the usual suspects.

Once I was satisfied with that, I proceeded to install that huge, complex management program I alluded to earlier. The previous incarnation had been started off as a 16 bit application and upgraded over the years to its current form. Wanting this install to mirror that one, I went through the process of installing the earlier versions and then upgrading them. The history of paperwork was a bit confusing, stating contradictory things in different places, and the whole thing was a mess. For an entire Sunday I worked on this, well into the night. I quit about 2am, too tired to think.

Next morning, I tried loading the latest version of the program and trying it again. Everything seemed fine until I loaded in my backup data. Then the whole thing fell apart and nothing worked. When 9am rolled around, I called their customer support. The wonderful woman I talked to listened carefully to my story, asked a couple of questions and then informed me that the data I was trying to change was dynamic in nature and I simply couldn’t do that and expect it to work.

All The Data Was Lost

Now, on the third day into the mouth of devastation, I finally realized the extent of the catastrophe. The problem wasn’t that I couldn’t get that program to work. The problem was, that even if I did, All The Data Was Lost. To top it all off, someone at this same company had told me which data to back up when I’d called and asked them which files are required to be backed up to save the company’s data. Thanks to this misinformation all my backups were of files that couldn’t be copied over, and none of them had my customer’s data.

This is a catastrophe. This is the thing you never plan for, the thing you couldn’t see coming.

Most huge management programs have menu options to create backups. Some even remind you every so often. On those, backups are easy. Most people don’t even bother with someone like me to do that, they do it themselves. But this program, though it stores all the data required to run a busy business in a specialized and growing sector of the economy, doesn’t have such a function.

When I’d talked to one of their guys about backups in the past, he told me I needed to preserve the contents a particular folder, and that’s what I had been doing. The email I received from the woman this day showed four folders and several files that contained the data. None of which were included in my regular backup files. But the game was not over yet.

Luck, A Plan and Hope

Back in October I had done some unscheduled maintenance to this machine, and had made a larger backup of the drive contents, outside the context of the regular schedule. When I looked in there, I found the files I wanted. I grabbed them and used them as the backups. This time when I ran the program, the company files were found, and after another 30 minutes of rebuilding and optimizing, the program gave me the correct login screen. A decade of business data had been recovered, but that left almost 4 months of data missing.

It was an important victory, a master stroke of luck. But it wasn’t enough. I needed to find some way of getting that missing data back. A plan began to form in my mind. And with it, a hope that this might all just work out fine. I’ll continue with the story tomorrow. Maybe you’ll drop by and see how it plays out.



4 Replies to “Catastrophe! Pt 1”

Comments are closed.