A while back my NSLU2 setup broke down, the reason was that an external USB drive failed. The HD model had a known problem that caused the drive to fail. The drive was under warranty at the time, but I knew that the actual HD was most likely working just fine. After a bit of pondering I decided to void the warranty and took out the HD from the case. After a few scares, the drive worked correctly.

When I bought that drive, I was smart and bought a second drive for backup purposes.. But as usual, the drive wasn't connected to anything and didn't have any backup value. So I had to set up the mirroring if I was to avoid future crashes, maybe I wouldn't be so lucky next time.

When I set up the software raid, I went with plain LVM mirroring. Almost all documents in the web talked about mdadm + LVM, but that would make my setup more complex and I didn't want to go that way with the limited CPU and RAM of the NSLU2. Setting up plain LVM mirroring was almost too easy, the only caveat was that for plain 2 drive mirroring you need 3 drives in the Volume Group. This wasn't a problem for me since I already had an USB Stick acting as the /var of the box (this allows the HDs to spin down).

So the complete process for adding a mirroring with LVM was just:

pvcreate /dev/sdc1
vgextend vg00 /dev/sdc1
lvconvert -m 1 vg00/Vol1

LVM will choose the drives it needs to use for the mirroring and start copying data. You can monitor the progress by doing

lvs -a -o +devices

That's all there is. Now, this post was supposed to be about recovering data from a broken volume group. A few days ago, the same drive took a dive from the table (maybe it figured out that it was on extended time already) and broke down (yes, we all know that clicking sound).

The NSLU2 hung and I quickly figured out that the only thing that wasn't mirrored was the data on /var and Swap. /var was on a USB Stick, but the Swap happened to be on that broken drive. OK, so kernel doesn't handle disappearing swap too well (at least if you have 32MB of ram). So I tried to reboot it, with little luck.

LVM isn't too helpful when it comes to broken drives. It will work just fine if you don't deactivate the Volume Group, but it refuses to activate it before it's fixed. I moved the working drives to my laptop and ran lvs to see the situation (udev already set up the VG for me)

lvs -P -a -o +devices

The -P flag means partial, so lvs includes the status of those Volume Groups which are missing physical volumes. The list verified that the Swap was actually on the broken drive but everything else was either mirrored on a working drive or on the stick. Removing the missing drives fixes the Volume Group and allows the drives to be mounted read/write.

vgreduce --test --removemissing vg00
vgreduce --removemissing vg00

The first run is just to see that it's actually capable of doing something sane. It will complain about missing drives on both run though.

Finally after the Volume Group was fixed I just ran fsck on all the volumes just to make it easier for the NSLU2 to boot. One of the partitions did have corruption, but I think it has been corrupted since the drive initially failed, it just wasn't scanned until now.

To remove the drives cleanly without reboot you need to run the following:

vgchange -a n vg00
sync

Then you can just unplug the drives.

Initially i made the mistake of exporting the VG before disconnecting. I wanted to make as clean exit as i could, but exporting the VG will change the state on disk and the NSLU2 will not boot before the VG is imported again.

Comments on this page are closed.