Anecdotal

Hit by Kernel Panic – How I Almost Lost My Job

January 9, 2015

“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.” 

  ― Albert Einstein

Kernel Panic is the death cry of a computer system (running on UNIX). And this is the story of a computer system that lay on its death throes when I  was the Database Administrator (DBA) of a computer based telecom retail billing system of a Telecom District in the Department of Telecommunications (DoT), government of India. (A telecom district normally covers the telecom services of a revenue district. DoT was then the sole Telecom Service provider on the Indian telecom scene and wire-line (landline) the only technology)

I knew very little about computers when in the early nineteen nineties I was assigned the task of moving the billing operations of a Telecom District from the manual system to the computer based system. The process of computerization involved the extraction of massive data from the manual records, organizing these into formats suitable for data entry, checking and punching this data into the computer based system, pre-processing the captured data and cross-verifying the outputs from the computer with manual records, making corrections and modifications and son on.  We had around five months to complete the migration and to generate the first bill from the computerized system. It was challenging task. We raced against time to accomplish the mission. The computer centre worked round the clock during the migration period. As the person heading the computerization activities, I virtually lived in my office during the entire period of the migration process.  My home was some twenty-five kilometres away where I had lived with my wife and our seven-year-old son.  But most of the nights I slept on the bare floor of the computer centre. 

The job was finished in time and the first computerized bills were generated ahead of the schedule given to us.  And I continued in the computer centre as its head. (The solution was deployed on a UNIX Operating System with Oracle as the Relational Database Management System (RDBMS). Some forty odd terminals concurrently accessed the system to input or modify data).  As the Database Administrator, it was my personal responsibility to ensure the safety of the database. Backup was then the only protection against data loss since other options of internal redundancies were considered too costly to implement because of technology constraints and the phenomenal costs of peripherals during those days. I had been taking daily backups. The external medium used for copying out the backups was a tape cartridge (something similar to a music cassette). As time went by, the size of the database grew. Consequently, an export (backup) of the database, which was taken after the days regular activities were finished, needed over two hours. So, even after the period of migration, I had to stay put at the computer centre until eight or nine in the night.  I was getting addicted to computer and was happy in their company. (The only problem was that my family was neither happy nor impressed about it).   

Everything seemed to be sailing smoothly until one day I encountered a problem with the backup tape drive. The write operation was failing. Since the system was under Annual Maintenance Contract (AMC), I contacted the maintenance firm at Chennai over phone. They told me that the tape drive might need replacement.  A routine visit by the support engineer was due in a month or so. I thought I could wait until then to sort out the issue with the tape drive. I did not realize that I was taking a fatal risk by not taking regular backups. But I did not worry much since on all past occasions when the system went down, I could bring it up without any loss or damage to data.  However, I had overlooked one reality.  It was that none of the earlier crashes was triggered by failures of hard disk drives.

Then one fateful Friday afternoon the system went phut. The trouble had started with a scraping noise coming out of one of the hard disk drives. Those who work with computers would know that the first maintenance activity in the event of computer glitches is to shut down the system, wait for a while and restart it. I too did that. The system came up. Everything seemed okay. But it was not. After a while, the grating noise returned. Soon it turned into an ominous ringing sound. I noticed that the lamp on one of the hard disk drives had turned from green to a dull white, signalling a defect with the disk. While I watched, a flurry of messages flashed through the console. Then the screen went blank, and the system simply collapsed dead. The sudden silence in the CPU room of the computer centre was terrifying.  I picked up the phone and called the Chennai office of the maintenance firm.

Since mobile phones were still a distant dream, a direct personal contact with the support engineer was a near impossibility those days. The firm had an office at Kochi. I learnt from Chennai that the engineer was at Kochi. The Kochi office told me that Kumar (Not his real name), the engineer, was at a site at Kochi. An hour or so later, I got him on the phone. The first question Kumar asked me was about the backup. I felt dreadful. I told him that the last backup was taken at least a month back. I asked him whether the crash could have destroyed the data.  He responded saying that he would not know. “Things would have been safer with a backup”, he said. He promised to reach the site next morning.

I was certainly alarmed, yet tried to put up a brave face before my teammates.  I told them that the system would be up and running by next morning. Since my teammates had much confidence in me (although, mostly misplaced), they did not consider the crash a serious affair fit for publicity. Temporary downtimes were usual for large computer systems. Since the system was then working for post transaction data capturing (Offline working) and was not open for online transactions, the news of the crash remained more or less confined within the four walls of the computer centre. That was a comforting position.

Fortunately or unfortunately, my boss did not have much idea or interest in computers. Yet, he was my boss and I had a duty to keep him updated on the developments, although I very well knew that he would only be happy to learn about my distress. (In my entire service, I was hardly ever in the good books of my bosses. The reason was not that I did not know my work or was not adequately devoted to it. The problem was that I never acquired the sophistication to sugar-coat bitter truths. I was mostly blunt in my reactions. Also, I habitually stood firm by my convictions. Obviously my bosses were not going to love me!) I also knew that once he came to know, he would not leave me in peace for a moment. (When disaster strikes and you at the end of your wits, your boss would need an hourly update on the status). I thought I would take a call on sharing the news with my boss after Kumar arrived and took stock of the situation.   

I went home and unburdened my heart to my family. As always, they comforted me saying that everything was going to be okay. They probably did not understand  how I had painted myself into a corner. Perhaps, it was good that they did not understand.

It was nearly midnight when I heard the sounds of a car halting at our gate. Kumar did not want to wait until morning and decided to deal with the matter at the earliest. I got into his car and we headed for my office.  Kumar tried to start the system. But it would not budge. The easy option for Kumar was to get another drive, replace the faulty one with it and restore the system. But that would destroy the data stored in the disks. (This was because the pieces of data lay scattered across all the disks and when one disk from a bank of disks is replaced, the content of the rest of the disks would be rendered useless).So, the option of replacement of faulty hard disk drive was workable only if I had a current database backup.  In the absence of that, the contents of the disk could be saved only by somehow raising the disk from its death. (Readers would know that repairing hard disk drives at customer locations is by far an impractical proposition. The slightest mishandling or a tiny slip of a tool will render the drive useless and annihilate the data it held).

We worked until morning i.e. Kumar worked and I kept him sleepless company. But nothing changed that night. My distress was mounting by the moments. I knew that it would be my funeral if Kumar failed to restart the machine with all the hard disks and data intact. Also, time was not on our side. More than a job loss, the fears of a damage to my reputation as someone good and devoted in his job were tormenting my mind. 

Three days after the crash, the system still lay dead. All activities had come to a halt in the computer centre. Heaps and heaps of manual records accumulated. The only consolation was that another fortnight was available for the release of the next bill. I knew that my team would somehow clear the data entry arrears to prevent any delays in the release of the next bill. But that was impossible if there were further delays in getting the system up.

 It was a slow and painful work for Kumar.  Besides, he had regular calls from other sites which he attended during daytime and devoted his nights on our machine. He had already replaced the faulty cartridge tape drive of the system. (The first thing that Kumar wanted was to take a backup of the database when the system, hopefully, came up).   

The sixth day after the crash dawned. The status of the machine remained unchanged. I was extremely nervous. I could feel that the initial confidence was waning in Kumar too.  Nevertheless, he kept saying that in case he could not get the disk work, he had the option of getting its contents extracted. But, I doubted whether he himself truly believed in what he said. (Even in those days, there were tools enabling extraction of data from crashed hard disks. But it was a mere possibility with no assurance of data remaining intact or usable after extraction. Besides that was an operation to be performed by specialists in a lab setting).

Somehow I had not been able to muster the heart to have meeting with my boss. I was starting each day with the hope that the system would come up that day. So I kept postponing the meeting. If the system was restored, the agonies of such a meeting could be avoided.  But I could not postpone it any further. So, on the sixth day I decided to bite the bullet, although I did not expect any understanding or support from the side of my boss. The meeting was stormy as anticipated. It may not be ethical for me to share the details here.  Suffice it to reproduce what I remember to have read in an Archie’s poster. “Rule No. 1 – The boss is always right; Rule No. 2 – Whenever the boss is wrong, refer to Rule No. 1”.  

Kumar had in the meanwhile requisitioned a replacement hard disk.   It was clear that Kumar had at last decided to give up his weeklong struggles to restore the faulty disk. It arrived on the day I met my boss. Kumar was  veering towards the option of importing data from the last backup taken a month or so earlier. (Hopefully, the tape was intact and readable). Then it would have been our job to try and reconstruct the missing data for the period from the date of last backup and the date of crash, from the relevant paper records. It was going to be a very tedious and time-consuming operation with no guarantee of recapturing the database state of the moment of the crash. In any case, one thing was certain.  I would not be there to do it. It was certain from the meeting with my boss that my fate was sealed.

Kumar, as usual, came in the evening that day. He too had no rest or sleep for so many days. He told me that he planned to make a last attempt to bring up the faulty disk.  “If we fail, we replace the faulty disk.  If we succeed, it would be a miracle”, he told me.  I was too exhausted, confused and crestfallen to say anything in response. Nothing short of a miracle was needed if Kumar had to succeed. Hours went by. I looked at my watch. It was nearly three in the morning. Kumar had already finished his work on the hard disk drive and reassembled the system. Then he turned to me and said. “We will try starting it now”. My heart skipped a beat and a thought flashed through it, “A miracle or a calamity?”

Kumar plugged in the system to the power source and switched on supply. Then he turned and asked me to press the start button on the machine.  I did not dare. He pushed it. I stood there holding my breath and trembling in excitement. Precious moments ticked away. My heart was racing. Then suddenly the system shivered as the noise of whirring fans inside the system broke the deathly silence in the room.   It was the sweetest noise I had ever heard all my life. Then the lights on the hard disk drives came on. They started blinking. The lights were all green! The system was up. The start up process was completed. Database was mounted and opened.  I tried a database query and was rejoiced to see the data pulled out by the query rolling on the screen. There were no more scratching noise from the hard disk drive. The humming of the system was steady and rhythmic. I simply embraced Kumar as my sleepy eyes welled with tears.  But the terrors of the day were still not over … 

Kumar inserted a cartridge tape into its drive and activated the database backup tool. It would be at least two hours for the backup process to finish. Kumar arranged a length of chairs and lied down. I called home to report the good news and stretched myself on the bare floor of the CPU room.  I lay their watching the lights furiously blinking on the hard disks and listening to the comforting sounds of the system.  But I could not sleep. My tired eyes stayed latched to the flurry of the flashing lights on the system. Suddenly, I thought I heard a vague scratching noise from inside the system. I sat up startled and listened intently.  The machine was working. The details of the tables being exported was rolling on the console one after other.

The backup activities were successfully completed just before daybreak. I pulled out the tape cartridge from its drive. My life, good name and career remained entangled in that short length of simple magnetic tape. Kumar left for Ernakulum after a while. I did not want to leave. I wished to stay close to the machine enjoying the soothing music of its run.  One by one, my teammates arrived. They were happy to see that the system was restored.  There was a week’s data entry and processing pending. Everyone had to accelerate. The computer centre once again became a beehive of activities.  . 

Nearly an hour went by and everything seemed moving with clockwork precision. I thought I could go home, take a nap and return by the afternoon. I went into the CPU room for a final confirmation that things were smooth and steady with the system. Somehow, there was some vague feeling of dread in the depths of my heart. I listened raptly to the sounds that came out of the system. Yes. There were some intermittent scratchy noises. I thought I was distraught from exhaustion and dreaming up things.  I continued listening. The noise was unmistakable. Soon, it turned into a ringing sound. I remembered hearing the very same sounds before the system dropped dead a week ago. I glanced at the lamp on the restored hard disk drive. The lamp was dim. It was not green.  The portents were clear and ominous.  I panicked.  I did not want the system to die on me twice in a span of seven days. I would better kill it. I ran to the console and issued a shutdown command. The system came down.  But it was not a clean termination.

I updated Kumar on the position, went home and hit the sack. That night, Kumar came to our home once again to pick me up. He replaced the faulty hard disk drive. We reinstalled the system and imported data from the backup tape. By the time my team mates arrived, the system was up and running. But obviously there was scepticism about it.  I picked up the phone and called my boss. (He was, perhaps, disappointed!)  The system worked without giving serious troubles until it was replaced in due course to match current needs and technologies. From then onwards, a database backup was taken every working day.

Although the story did not have a tragic ending as anticipated by many (and hoped for by a few), those seven days were among the most painful and haunting days of my life. The traumatic experience helped me learn some important lessons. The biggest of those lessons was that I should never take things for granted.  The things you consider ordinary and worthless might turn out to be priceless. A backup medium may appear an innocuous piece of magnetic tape.  But your very life might one day hang on it…  

Epilogue

The above story is true except for the literary freedom I have taken in its telling and the inadvertent errors that might have crept in while recapturing incidents taking place close to two decades ago.

I often wonder how drastically different my life would have been if someone with less expertise than Kumar had dealt with the problem, or if the hard disk had failed to come up, or if it had come up only to crash before the backup process was finished, or if the backup tape had failed to read when import from it was attempted. All these were very much in the realm of possibilities.  Yet none of it befell. The miracle was that the faulty disk had regained life and breathed only for just enough time to facilitate a current backup.     

I continued to be among computers for the next twenty years of my service life, working for eighteen to twenty hours a day, seven days a week, managing systems, acting in the role of consultant for solution implementers and creating software solutions that saved millions for my employer. What was my reward? Well. For one thing, I thoroughly enjoyed my work, although much of what I did was not part of my formal assignments. And for another, I ended up fighting half a dozen court battles against my bosses(and winning every one of them). (May be, I will write on some of it in the future.)

And let me close with this humble advice.  Please watch your back if you are more energetic, intelligent and creative than your boss!! 

———————

Only registered users can comment.

  1. Dear Sir, Namaskar: Winston Churchill said 'Success consists of going from failure to failure without loss of enthusiasm.' It is the failure which is the ultimate for success is short lived till concurred.It was perhaps the faith the two of you had in each other -the trust that pulled you through; remember the age old adage: 'Himmat e marda to madade khuda'- He who has the courage is helped by the God!. kind regards adarsh

  2. Dear Adarsh,
    Namaskar, It is always a team effort. There is hardly anything we can do alone in this world. I cannot forget the contributions of others in whatever I have been able to achieve in my career or outside of it. There were some (like you) whose involvement was direct in helping me succeed. But there were others who remained unknown and invisible. Of course, God helps the bold (when intentions are pure).
    Thank you very much for your active interest in my posts…
    Regards,
    Kutty

  3. Sir,
    "What was my reward?" Gita says "Do your duty" every thing will be taken care of by HIM. You have done your duty. God has given you a lot of followers.

  4. Hi,
    I believe in the verse from the Holy Gita in the light of my own experiences in life. Yes. I have been blessed in abundance by HIM. I can never overlook this truth. I have no cause to complain but every reason to rejoice. The question was not one of frustration or disappointment. It was spontaneous and I thought it fitted the narrative.
    Thanks for reading and commenting. It would have been better if I knew the identity of the author of the comment. But that does not matter. But the comment matters much…
    Regards,
    Kutty

Leave a Reply

Your email address will not be published. Required fields are marked *