Personal tools
You are here: Home Members gwirth Recovery From A Failed Update
Document Actions

Recovery From A Failed Update

by Gus Wirth last modified 2007-07-06 18:00

How I helped a laptop user recover froma failed update

This is the story of how I helped recover a system from a failed update, or alternatively titled "How Knoppix once again helped save the day". Realistically, it could have been some other distribution besides Knoppix, but nowadays that happens to be my tool of choice.

The problem started with a guy named Mark from Tuscon that was out here on a business trip. He had an older Compaq Armada laptop with Mandriva 2007.0 installed. While he was at his motel he attempted to update to the latest updates. Mandriva uses a system called urpmi which serves the same function as yum in Fedora or apt-get in Debian or yast in SuSE. At some point late in the update, the update failed and left the system in an unusable state. The initial symptom was that the the system could not find anything to mount because the /etc/fstab file was missing.

I got involved when I got a phone call from Neil Schneider. He had gotten an e-mail from Mark, who had no where to turn and found the KPLUG web site and sent a message to the KPLUG President asking for help. Neil figured out where he was located and knew I was close, so he asked me to call Mark. I called him and asked a few questions about his system. It turns out his laptop had only a CDROM drive (no DVD capability) so I had to dig around to find my CDROM version of Knoppix. I also burned a Fedora 7 rescue CDROM in case I might need that. So I packed up my laptop, my copies of various versions of Knoppix (4.02, 5.0, 5.1.1) and went to meet Mark at his motel to see if I could help straighten out his problem.

The first test was to boot the system with Knoppix and make sure that the partitions were still there. Knoppix saw the two partitions and was able to mount them. The system had been configured so that there were only two partitions, / and /home. This made it relatively easier to do the recovery because I only had to deal with one mount point, /.

Knoppix had already created the mount point /mnt/hda1 that corresponded to /, so I mounted the partition and examined the /etc directory. I discovered that /etc/fstab was missing, but there was an /etc/fstab.rpmsave file. So it looked like the update had deleted stuff where it shouldn't have but at least there was a backup copy. I looked at it to make sure it made sense, and then copied that to fstab. I then booted the system without Knoppix to see if it would come up. The system got to a text login prompt (should have been GUI) but it wouldn't allow a login. Neither root nor a regular user could login.

Now I was starting to think there was a lot more damage than just a missing file, so I rebooted into Knoppix and started to look around some more. My first suspicion was that maybe the /etc/passwd or /etc/shadow files might have been damaged or removed but they looked to be intact. The other problem might have been PAM (Pluggable Authentication Modules), and if that were the case I'm not sure how I would have tried to recover other than a re-install.

So now I rebooted the system into single user mode using the kernel command line option "single". The system came up and almost looked normal. Mark was also shocked that I was sitting there at a prompt logged in as root without actually having to log in. Close examination of the bootup messages indicated a problem. A couple lines before the prompt was a message that said: unable to locate rc.sysinit. Sure enough, that file was missing and there was no backup like with the fstab file. Big problem. Using rpm, I tried to find what package would have /etc/rc.d/rc.sysinit. The Mandrivia system couldn't tell me. I checked in /var/lib/rpm/ to make sure the rpm database files were there, and they were. Checking for another package I tested rpm and it was there (rpm -q rpm) so it appeared the rpm database was working. So what package held rc.sysinit? My laptop was running Fedora Core 5, and knowing that Mandriva used to be Mandrake which used to be an optimized version of Redhat led me to hope that packages would be conserved through evolution. I check my system and found that rc.sysinit is in a package called initscripts.

By that time I had turned on my laptop and connected to the motel's wireless network so we could do some web searching. The first place to hit was Google, which gave a few hits for some older versions of initscripts, so it looked like I was on the right trail. But what I needed was a package specific to Mandriva on the laptop. Mark gave me the clues to track down a web site where I could directly download specific packages. Searching on "easyurpmi mandriva" and finding a site with "zarb" turned up <> which allowed me to download the Mandriva initscripts to my laptop. The reason we couldn't do it directly onto the Compaq laptop using Knoppix was because of problems with the motel's wired connection. For some reason it came up as 10baseT half-duplex and wouldn't connect. But there were two versions of the initscripts: 2007.0 and 2007.1. Mark said his system was early 2007 so I grabbed the 2007.0. Using a USB flash drive, the initscripts package was transferred to the Compaq laptop using Knoppix to transfer the package to the systems tmp directory (/mnt/hda1/tmp under Knoppix).

To install the package I did a chroot to the mounted / directory at /mnt/hda1 like so:

$sudo chroot /mnt/hda1/

changed to the tmp directory and attempted to install the initscripts package. Except it didn't want to work because of some conflicts. Looking at some other packages, it seemed that the system had mostly been upgraded to version 2007.1. So now I had to download the 2007.1 initscripts, use the USB key to transfer another package, and try again. This time it worked with a few warning messages, something about /dev/null being missing.

Looking back on that message it made sense, because almost all newer systems use udev to dynamically populate /dev. What I could have done is a bind mount of the Knoppix /dev over the system /dev like so:

mount --bind /dev /mnt/hda1/dev

which would have made all the devices available even when I was change root(ed?).

So one more try. Boot the system normally and it works! GUI login, user directories and files, a few applications all seem OK. It looks like it it mostly recovered. But I don't know what else might be damaged. It had already been about two hours since I started and it was time to go home.

There were also a few other oddities that I didn't figure out. Mandriva uses grub to be the boot manager, but I was unable to locate the grub.conf or menu.lst that grub uses to select kernels. grub was working so it had to be somewhere. I didn't try to investigate too much.

When doing the initial checks I tried to find where urpmi downloaded its packages for updating. I found the cache under /var, except there was only one package in it. It seems that urpmi cleared the cache when it crashed. Also, the fact that a package like initscripts was removed and unable to be recovered indicates the Mandriva does NOT use the transaction capability of rpm. If it does, it doesn't work.

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: