Monday, August 1, 2016

Adventures in Disk Replacement

The Problem

A few years ago I built a new FreeBSD desktop at home.  For simplicity of booting, etc. I used the built-in RAID1 mirroring provided by the on-board SATA controller.  This worked fine.

Recently one of my drives began reporting SMART errors (I am running smartd from sysutils/smartmontools in daemon mode and it sends emails to root@ for certain types of errors).  Originally the drive logged two errors:

Device: /dev/ada0, 8 Currently unreadable (pending) sectors
Device: /dev/ada0, 8 Offline uncorrectable sectors

It logged these two (seemingly related?) errors once a month for the past three months.  This month it logged an additional error at which point I decided to swap out the drive:

Device: /dev/ada0, ATA error count increased from 0 to 5

The simple solution would be to just swap out the dying drive for the replacement, reboot and and let the rebuild chug away.  However, I decided to make a few changes which made things not quite so simple.

First, my existing system was laid out with UFS with separate partitions for /, /usr, and /var.  It was at least using GPT instead of MBR.  However, I wanted to switch from UFS to ZFS.  I'm not exactly ecstatic about how ZFS' ARC interfaces with FreeBSD's virtual memory subsystem (a bit of a square peg in a round hole).  However, for my desktop the additional data integrity of ZFS' integrated checksums is very compelling.  In addition, switching to ZFS enables more flexibility in the future for growing the pool as well as things like boot environments, ZFS integration with poudriere, zvols for my bhyve VMs, etc.

Second, since I was going to be doing a complicated data migration, I figured I might as well redo my partitioning layout to support EFI booting.  In this case I wanted the flexibility to boot via legacy mode (CSM) if need be but having the option of switching to EFI.  This isn't that complicated (the install images for FreeBSD 11 are laid out for this), but FreeBSD's installer doesn't support this type of layout out of the box.

Step 1: Partitioning

I initially tried to see if I could do some of the initial setup using partedit from FreeBSD's installer.  However, I quickly ran into a few issues.  One, my desktop was still 10-STABLE which didn't support ZFS on GPT for booting.  (Even partedit in 11 doesn't seem to handle this by my reading of the source.)  Second, partedit in HEAD doesn't support creating a dual-mode (EFI and BIOS) disk.  Thus, I resorted to doing this all by hand.

First, I added a GPT table which is pretty simple (and covered in manual page examples):

# gpart create -s gpt ada2

To make the disk support dual-mode booting it needs both a EFI partition and a freebsd-boot partition.  For the EFI partition, FreeBSD ships a pre-formatted FAT image that can be written directly to the partition (/boot/boot1.efifat).  However, the formatted filesystem in this image is relatively small, and I wanted to use a larger EFI partition to match a recent change in FreeBSD 11 (200MB).  Instead of using the formatted filesystem, I formatted the EFI partition directly and copied the /boot/boot1.efi binary into the right subdirectory.  Ideally I think bsdinstall should do this as well rather than using the pre-formatted image.

# gpart add -t efi -s 200M -a 4k ada2
# newfs_msdos -L EFI /dev/ada2p1
# mount -t msdos /dev/ada2p1 /mnt
# mkdir -p /mnt/efi/boot
# cp /boot/boot1.efi /mnt/efi/boot/BOOTx64.efi
# umount /mnt

To handle BIOS booting, I installed the /boot/pmbr MBR bootstrap and /boot/gptzfsboot into a freebsd-boot partition.

# gpart bootcode -b /boot/pmbr ada2
# gpart add -t freebsd-boot -s 512k -a 4k ada2
# gpart bootcode -p /boot/gptzfsboot -i 2 ada2

Finally, I added partitions for swap and ZFS:

# gpart add -t freebsd-swap -a 4k -s 16G ada2
# gpart add -t freebsd-zfs -a 4k ada2

At this point the disk layout looked like this:

# gpart show ada2
=>        34  1953525101  ada2  GPT  (932G)
          34           6        - free -  (3.0K)
          40      409600     1  efi  (200M)
      409640        1024     2  freebsd-boot  (512K)
      410664    33554432     3  freebsd-swap  (16G)
    33965096  1919560032     4  freebsd-zfs  (915G)
  1953525128           7        - free -  (3.5K)

Step 2: Laying out ZFS

Now that partitioning was complete, the next step was to create a ZFS pool.  The ultimate plan is to add the "good" remaining disk as a mirror of the new disk, but I started with a single-device pool backed by the new disk.  I would have liked to using the existing zfsboot script from FreeBSD's installer to create the pool and layout the various filesystems, but trying to use bsdconfig to do this just resulted in confusion.  It refused to do anything when I first ran the disk editor from the bsdconfig menu because no filesystem was marked as '/'.  Once I marked the new ZFS partition as '/' the child partedit process core dumped and bsdconfig returned to its main menu.  So, I punted and did this step all by hand as well.

I assumed that the instructions on FreeBSD's wiki from the old sysinstall days were stale as they predated the use of boot environments in FreeBSD.  Thankfully, Kevin Bowling has more recent instructions here.

Of course, one important step is that you need the ZFS kernel module to use ZFS.  The custom kernel I used on my desktop had a stripped down set of kernel modules so I had to add ZFS to the list and reinstall the kernel.

First, I created the pool:

# mkdir /tmp/zroot
# zpool create -o altroot=/tmp/zroot -O compress=lz4 -O atime=off -m none zroot /dev/ada2p4

Next, I added the various datasets (basically copied from Kevin's instructions):

# zfs create -o mountpoint=none zroot/ROOT
# zfs create -o mountpoint=/ zroot/ROOT/default
# zfs create -o mountpoint=/tmp -o exec=on -o setuid=off zroot/tmp
# zfs create -o mountpoint=/usr -o canmount=off zroot/usr
# zfs create zroot/usr/home
# zfs create -o setuid=off zroot/usr/ports
# zfs create -o mountpoint=/var -o canmount=off zroot/var
# zfs create -o exec=off -o setuid=off zroot/var/audit
# zfs create -o exec=off -o setuid=off zroot/var/crash
# zfs create -o exec=off -o setuid=off zroot/var/log
# zfs create -o atime=on zroot/var/mail
# zfs create -o setuid=off zroot/var/tmp
# zpool set bootfs=zroot/ROOT/default zroot
# chmod 1777 /tmp/zroot/tmp
# chmod 1777 /tmp/zroot/var/tmp

Step 3: Copy the Data

In the past when I've migrated UFS partitions across a drive, I used 'dump | restore' which worked really well (preserved sparse files, etc.).  For this migration that wasn't an option.  Since I had seperate UFS partitions I had to copy each one over:

# tar -cp --one-file-system -f - -C / . | tar -xSf - -C /tmp/zroot
# tar -cp --one-file-system -f - -C /var . | tar -xSf - -C /tmp/zroot/var
# tar -cp --one-file-system -f - -C /usr . | tar -xSf - -C /tmp/zroot/usr

Since I had been using UFS SU+J, I had copied over the .sujournal files, so I deleted those.

# rm /tmp/zroot/.sujournal /tmp/zroot/var/.sujournal /tmp/zroot/usr/.sujournal

Step 4: Adjust Boot Configuration

I added the following to /etc/rc.conf:

zfs_enable="YES"

and to /boot/loader.conf:

zfs_load="YES"
kern.geom.label.disk_ident.enable=0
kern.geom.label.gptid.enable=0

I also removed all references to the old RAID1 mirror from /etc/fstab.

With all this done I was ready to reboot.

Step 5: Test Boot

My BIOS does not permit selecting a different hard disk at boot, so I had to change the default boot disk in the BIOS settings.  Once this was done the system booted to ZFS just fine.

Step 6: Convert to Mirror

After powering down the box I unplugged the dead drive and booted up.  I verified that the remaining drive's serial number did not match the drive that had reported errors previously.  (I actually got this wrong the first time so had to boot a few times.)  Once this was correct I proceeded to destroy the now-degraded RAID1 in preparation for reusing the disk as a mirror.

# graid delete raid/r0

At this point, the raw disk (/dev/ada0) still had the underlying data (in particular a GPT), so that had to be destroyed as well:

# gpart destroy -F ada0

Now the ada0 disk needed to be partitioned identically to the new disk (now ada1).  I was able to copy the GPT over to save a few steps.

# gpart backup ada1 | gpart restore ada0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada0
# newfs_msdos -L EFI /dev/ada0p1
# mkdir -p /mnt/efi/boot
# cp /boot/boot1.efi /mnt/efi/boot/BOOTx64.efi
# umount /mnt

Next, I added the two swap partitions to /etc/fstab and ran the /etc/rc.d/swap and /etc/rc.d/dumpon scripts.

Finally, I attached the ZFS partition on ada0 to the pool as a mirror.  NB: I was warned previously to be sure to use 'zpool attach' and not 'zpool add' as the latter would simply concatenate the disks and not provide redundancy.

# zpool attach zroot /dev/ada1p4 /dev/ada0p4
Make sure to wait until resilver is done before rebooting.

If you boot from pool 'zroot', you may need to update
boot code on newly attached disk '/dev/ada0p4'.

Assuming you use GPT partitioning and 'da0' is your new boot disk
you may use the following command:

 gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

The one nit about the otherwise helpful error messages is that they are hardcoded to assume the freebsd-boot partition is at partition 1.  I suspect it is not easy to auto generate the correct command (as otherwise it would already do so), but it may need a language tweak to note that the index may also need updating, not just the disk name.  Also, this doesn't cover the EFI booting case (which admittedly is new in FreeBSD 11).

Anyway, the pool is now happily reslivering:

# zpool status
  pool: zroot
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
 continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Aug  1 08:12:58 2016
        1.63G scanned out of 207G at 98.1M/s, 0h35m to go
        1.63G resilvered, 0.79% done
config:

 NAME        STATE     READ WRITE CKSUM
 zroot       ONLINE       0     0     0
   mirror-0  ONLINE       0     0     0
     ada1p4  ONLINE       0     0     0
     ada0p4  ONLINE       0     0     0  (resilvering)

errors: No known data errors

Testing EFI will have to wait until I upgrade my desktop to 11.  Perhaps next weekend.

Updates

Some feedback from readers:
1) restore doesn't actually assume a UFS destination, so I probably could have used 'dump | restore'.
2) The 'zpool export/import' wasn't actually needed and has been removed (the create is sufficient).

Also, for the curious, the resliver finished in less than an hour:

# zpool status
  pool: zroot
 state: ONLINE
  scan: resilvered 207G in 0h53m with 0 errors on Mon Aug  1 09:06:26 2016

No comments:

Post a Comment