…and how GRUB got in the way

On October 1st ABQLUG gathered for it’s 4th meetup. In total, there were 4 members that showed up to try and hack on a Ubuntu 19.10 machine. What was the goal? To setup ZFS on root with native ZFS encryption on the root pool.

Some highlights as to why you may want to use ZFS:

  • Superior snapshot capabilities
  • Protection against bit-rot
  • Raid-Z, a replacement for traditional RAID
  • Extreme tune-ability

There are some downside though:

  • You cannot easily add a single drive to an already created pool
  • Finding out if your drive is 4k sector compatible can be difficult. Looking at the drive manufacturers data sheet might tell you
  • Faulty memory can lead to a broken filesystem; ECC is recommended but not mandatory

This is a write up of the commands we used, in case you want to try this out yourself. I used this guide as a starting point: https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS

Update:

10/10/2019 – I ended up telling some of the Ubuntu devs over at the Ubuntu forums about the GRUB issues we were having. Several days later they released a patch for GRUB that makes ZFS work under Secure Boot. Likely, turning off Secure Boot would have also allowed this tutorial to work.

Prerequisites:

  • A computer that has EFI capabilities
  • Install Ubuntu 19.10 to a USB drive
  • Use the USB drive so you have a Terminal prompt to the Live USB
  • Basic knowledge of navigation of nano, or vi. If you want to use vi/ee/emacs, then replace the nano section with whatever editor you’re used to
  • Knowledge of /dev/disk/ paths, so you can change this to be relevant to your setup
  • If you can figure out how to install zfs, you really need to learn the CLI commands to properly maintenance the pools
  • This is a list of ZFS specific terms: https://www.freebsd.org/doc/handbook/zfs-term.html

Once you have a Terminal window open from the Live USB then you will want to switch to the root user as all of these commands need root access.

sudo -i

The Universe repository is required for ZFS packages since apt update is done for you.

apt-add-repository universe

Now we need to install debootstrap to install Ubuntu onto the hard drive, gdisk to make and destroy hard drive partitions, and zfs-initramfs to grab ZFS and it’s dependencies. Using –yes makes apt install without a Y prompt.

apt install --yes debootstrap gdisk zfs-initramfs

You can set up partitions and pools with the disks listed in /dev/disk/ you need to see what you’re working with. At this point you need to be sure you select the correct drive. I know ahead of time that I want to use a Samsung SSD as the root and boot pool.

Once we start using sgdisk, there is no (easily) turning back from a destroyed partition table. You’ve been warned.

ls /dev/disk/by-id

This is the output when run that ls command. Drives are shown as the model number+serial typically. Anything with -part# is a partition. Below listed is a Samsung SSD, a USB Drive (the Live CD), and a mechanical hard drive. I probably should have unplugged that mechanical hard drive ahead of time, to make sure I couldn’t accidentally use that one.

ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F
ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part1
ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part2
ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part3
usb-General_UDisk-0:0
usb-General_UDisk-0:0-part1
wwn-0x5002538d40b7eb77
wwn-0x5002538d40b7eb77-part1
wwn-0x5002538d40b7eb77-part2
wwn-0x5002538d40b7eb77-part3
wwn-0x5002538d40b7eb77-part4

Now that we know we want to use /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F, lets wipe the three partitions from the partition table. Again, change /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F to whatever makes sense to your setup.

sgdisk --zap-all /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F

Now that we have an empty partition, let’s setup an EFI partition, boot partition, swap partition, and a root partition. There are many different ideas on how to set these up, especially the swap partition. However there are many reasons you do not necessarily want to use swap on ZFS, so it would make more sense to separate it rather than leaving SWAP out altogether.

First, setup the boot partition:

-n1:1M:+1G creates a new partition (#1), starts the partition after 1 MB and ends after 1GB. Yes, you need to add 1M to this command.

-t1:BF01 changes the typecode to ZFS for partition #1

sgdisk -n1:1M:+1G -t1:BF01 /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F

-n2:0:+1G creates a new partition (#2), starts the partition right after partition #1 and ends after 1 GB.

-t2:EF00 changes the typecode to EFI for partition #2

sgdisk -n2:0:+1G -t2:EF00 /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F

-n3:0:+8G creates a new partition (#3), starts the partition right after partition #2 and uses 8 GB.

-t3:8200 changes the typecode to SWAP/Linux for partition #3

sgdisk -n3:0:+8G -t3:8200 /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F

-n4:0:0 creates a new partition (#4), starts the partition after right after partition #3 and uses the rest of the available space.

-t4:BF01 changes the typecode to ZFS for partition #4

sgdisk -n4:0:0 -t4:BF01 /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F

Lets make the EFI partition (#2) into a FAT32 filesystem.

-F 32 Allows us to use fat32

-s 1 is for 4 KB logical sectors (“4Kn” drives), remove this option if you are using a 512-byte sectored drive. To know if you are using a 4K sectored drive, or a 512-byte sectored drive, you need to look at the manufactor’s data sheet. Typically SSDs in the last several years are 4 KB sectored drives. Samsung didn’t include any documentation for me to go off of, so I am assuming here.

-n EFI Makes EFI the volume name

mkdosfs -F 32 -s 1 -n EFI /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part2 

For the SWAP partition, we can create it simply with this command:

mkswap /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part3

Now lets setup the boot pool, or bpool for short.

If you want to see some of the zpool options, you might want to refer to these links:

  • https://zfsonlinux.org/manpages/0.8.2/man8/zpool.8.html
  • https://zfsonlinux.org/manpages/0.8.2/man5/zpool-features.5.html
  • https://docs.oracle.com/cd/E19253-01/819-5461/gazss/index.html

I couldn’t find out what every option does, however, this is supposed to be a GRUB compatible setup so I tried to use this. It might be worth taking most of these options out to see if GRUB plays any better, though you might want to leave compression on. YMMV.

Change ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part1 to whater your setup has. And I used ashift=12 since I had a somewhat modern SSD.

zpool create -o ashift=12 -o [email protected]_destroy=enabled -o [email protected]=enabled -o [email protected]_data=enabled -o [email protected]_bpobj=enabled -o [email protected]_txg=enabled -o [email protected]_dataset=enabled -o [email protected]_limits=enabled -o [email protected]_birth=enabled -o [email protected]_blocks=enabled -o [email protected]_compress=enabled -o [email protected]_histogram=enabled -o [email protected]_accounting=enabled -O acltype=posixacl -O canmount=off -O compression=lz4 -O devices=off -O normalization=formD -O relatime=on -O xattr=sa -O mountpoint=/ -R /mnt bpool /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part1

Now lets create the root pool, or rpool.

This time we will use native ZFS encryption to encrypt this pool. Again, I used the ZFSonLinux guide to get this command, refer to that page if you want some more information.

zpool create -o ashift=12 -O acltype=posixacl -O canmount=off -O compression=lz4 -O dnodesize=auto -O normalization=formD -O relatime=on -O xattr=sa -O encryption=aes-256-gcm -O keylocation=prompt -O keyformat=passphrase -O mountpoint=/ -R /mnt rpool /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part4

Now we probably should setup datasets within the new pool. This will allow for more fine-tuning on different directories. This setup depends on your needs and use-case. This will create a ROOT data set and a BOOT dataset. We do not mount these “base” datasets.

This was done to be in alignment with the ZoL wiki. That wiki has more information over datasets. It’s possible that the GRUB issue might be related to these datasets. However, I did not get a chance to experiment with alternate settings.

zfs create -o canmount=off -o mountpoint=none rpool/ROOT
zfs create -o canmount=off -o mountpoint=none bpool/BOOT

Now lets create a dataset with in ROOT. This sub-dataset will be mounted as /.

zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/ubuntu
zfs mount rpool/ROOT/ubuntu

We’re now going to create a sub-dataset to BOOT dataset on the bpool. This will be mounted as /boot.

zfs create -o canmount=noauto -o mountpoint=/boot bpool/BOOT/ubuntu
zfs mount bpool/BOOT/ubuntu

Create a new dataset for rpool, this one will be called home.

zfs create rpool/home

Now create a sub-dataset of home. This one will be /root. For the home directory for the root user.

zfs create -o mountpoint=/root rpool/home/root

Now create a new dataset for /var. Set to not mount it.

zfs create -o canmount=off rpool/var

Now create a sub-dataset to /var, one will be /var/lib.

zfs create -o canmount=off rpool/var/lib

Same thing, but this one is called /var/log, instead of /var/lib. However, we will mount this one.

zfs create rpool/var/log

Similar to /var/log, but this one is for /var/spool

zfs create rpool/var/spool

Now lets mount the /boot/efi partition.

mkdir /mnt/boot/efi
mount /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part2 /mnt/boot/efi

Now that we have the rpool, bpool, and the EFI partitions setup, we can use debootstrap to install 19.10 (beta currently) to the rpool partition.

debootstrap eoan /mnt

The ZoL wiki wants us to do this ZFS command, however, I’m not 100% sure of what it does. This page might help explain what this does: https://docs.oracle.com/cd/E19253-01/819-5461/gazss/index.html

zfs set devices=off rpool

Now we have Ubuntu 19.10 installed, lets start some basic configuration. First, set a desired hostname.

echo eoan-beta | tee /mnt/etc/hostname

Change the hostname to use the same hostname set in /etc/hostname

echo '127.0.1.1       eoan-beta' | tee -a /mnt/etc/hosts

To setup the network we need to first find the device name for your Network Interface Card. To do this, simply run this:

ip link

Now we need to configure the network. For my setup, DHCP made the most sense. Also, .yaml files need to be formatted correctly. Whitespace matters with .yaml files. If you use tabs instead of spaces, netplan will not accept the .yaml file.

You will want to also change the part that has enp2s0 with your own device name. Examples may be found here: https://netplan.io/examples

nano /mnt/etc/netplan/01-netcfg.yaml

Now fill that file with this. Again, you may want to setup something besides DHCP. If so, refer to the netplan examples. For sure, you will need to alter the enp2s0: with whatever network device your NIC is using.

network:
  version: 2
  ethernets:
    enp2s0:
      dhcp4: true

Lets alter the sources.list file for apt to use additional repositories.

nano /mnt/etc/apt/sources.list

Now delete any data in that file, then add this.

deb http://archive.ubuntu.com/ubuntu eoan main universe
deb-src http://archive.ubuntu.com/ubuntu eoan main universe

deb http://security.ubuntu.com/ubuntu eoan-security main universe
deb-src http://security.ubuntu.com/ubuntu eoan-security main universe

deb http://archive.ubuntu.com/ubuntu eoan-updates main universe
deb-src http://archive.ubuntu.com/ubuntu eoan-updates main universe

Before we chroot into the Ubuntu install we just made, we need to rbind /dev /proc and /sys. Use rbind, not bind, please!

mount --rbind /dev /mnt/dev
mount --rbind /proc /mnt/proc
mount --rbind /sys /mnt/sys

Now we can use chroot to get a root prompt from within the rpool.

chroot /mnt /bin/bash --login

Now lets do some more basic configuration.

You will want to select the en_US.UTF-8 locale.

dpkg-reconfigure locales

Then you want to setup the timezone to use US/Mountain since we’re in New Mexico.

dpkg-reconfigure tzdata

Since we manually changed the sources.list for apt, we need to update apt.

apt update

Now we need to install the microcode to your cpu. For me, it was an AMD CPU. The /proc/cpuinfo file will help identify what CPU you are using. If it’s Intel, use intel-microcode instead of amd64-microcode.

apt install --yes amd64-microcode

Now lets install the linux image package. However, we do not want to install GRUB yet, so you will want to use the –no-install-recommends flag.

apt install --yes --no-install-recommends linux-image-generic

Then all we have left is zfs and related packages. I also added dosfstools nano, thought that is likely optional, unless you want to use nano for the remainder of this walk-through.

apt install --yes zfs-initramfs dosfstools nano

Now we need to create the /etc/fstab file. This time I opted to use UUIDs instead of by-id. Mostly because that is what I typically do with other distros. Again, replace ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part2 to reflect your setup. This will be allow us to mount /boot/efi automatically.

echo UUID=$(blkid -s UUID -o value /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part2) /boot/efi vfat nofail,x-systemd.device-timeout=1 0 1 >> /etc/fstab

We then need to add the same thing for the SWAP partition. If you did not create a SWAP partition, then we can simply ignore this step.

echo UUID=$(blkid -s UUID -o value /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANXAH333304F-part3) none swap defaults 0 0 >> /etc/fstab

You might want to make sure that /etc/fstab was created correctly.

head /etc/fstab

We should be able to mount SWAP now that it is defined in /etc/fstab

swapon -av

Since /boot/efi is mounted, we can install GRUB for an EFI system. If you are unsure if /boot/efi is mounted, you might want to run: ls -l or run: lsblk

apt install --yes grub-efi-amd64-signed shim-signed

We need to adjust systemd so it can properly mount the bpool. First we need to create and fill the zfs-import-bpool.service file.

nano /etc/systemd/system/zfs-import-bpool.service

Fill the file with all of the following below.

[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool

[Install]
WantedBy=zfs-import.target

Save and quit nano.

Now we need to enable the newly created zfs-import-bpool.service file for systemd.

systemctl enable zfs-import-bpool.service

If you created datasets outside of this walk-through, and if /tmp was on of them, skip this step. Likely you will want to run these commands. This will allow /tmp to be mounted as tmpfs (RAM filesystem)

cp /usr/share/systemd/tmp.mount /etc/systemd/system/
systemctl enable tmp.mount

Let’s show Canonical that we are using ZFS

apt install --yes popularity-contest

If you need CUPS:

addgroup --system lpadmin

If you need SAMBA:

addgroup --system sambashare

Verify that grub sees ZFS on /boot

grub-probe /boot

Refresh initrd:

update-initramfs -u -k all

Now we need to configure GRUB to mount the rpool. The rest are optional. However, it is recommended if you need to troubleshoot GRUB. Though it didn’t help me much during my testing. YMMV.

nano /etc/default/grub

Optional – Change GRUB_CMDLINE_LINUX= to:

GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/ubuntu"

Optional – Comment out:

#GRUB_TIMEOUT_STYLE=hidden 

Optional – Set GRUB_TIMEOUT= to:

GRUB_TIMEOUT=5

Optional – Below GRUB_TIMEOUT=, add:

GRUB_RECORDFAIL_TIMEOUT=5

Optional – Remove quiet and splash from: GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash”

 GRUB_CMDLINE_LINUX_DEFAULT=""

Optional – Uncomment #GRUB_TERMINAL=console

Uncomment: GRUB_TERMINAL=console

Save and quit nano.

Now we need to run the GRUB program to fully setup GRUB.

I didn’t get any errors here. However, I think this is where it all goes wrong… It looks like GRUB recognizes that it needs to use ZFS. However, I couldn’t get GRUB to create the menu entry, even though it ended up making a 990 line bash script for the menu entry.

Run this first.

grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=ubuntu --recheck --no-floppy

If you run update-grub you should see the menu entries being created for the ZFS pool. However, this didn’t work for me.

update-grub

This is the output we got when we ran that command. Again, you should not see this.

[email protected]:/# update-grub
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Warning: didn't find any valid initrd or kernel.
Warning: Failed to find a valid directory 'etc' for dataset 'rpool/ROOT/[email protected]'. Ignoring
Warning: Ignoring rpool/ROOT/[email protected]
device-mapper: reload ioctl on osprober-linux-sda4  failed: Device or resource busy
Command failed.
Adding boot menu entry for EFI firmware configuration
done

It created a menu entry for EFI firmware which only takes you to the BIOS screen to your motherboard. In my opinion, the breakdown happens because GRUB can’t parse the rpool to a “canonical path.”

We tried some troubleshooting. First we made sure that the zfs.mod file existed, which it did.

ls /boot/grub/*/zfs.mod

Then we looked for the proper bash scripts located in /etc/grub.d/

ls /etc/grub.d/

From there we can see 10_linux_zfs. So the menu entries should be okay to be created. Something is stopping this process from completing.

However, if you can overcome this. Then you should be able to finish the rest of walkthrough.

The ZoL wiki states that there is no mount gentator for ZFS when used with systemd. There might be mount issues with /var/log and /var/tmp.

If you created seperate datasets for /var, specifically /var/log and /var/tmp; you need to set those datasets to use legacy mounting. Listing them in /etc/fstab makes systemd aware that these are separate mountpoints. In turn, rsyslog.service depends on var-log.mount by way of local-fs.target and services using the PrivateTmp feature of systemd automatically use After=var-tmp.mount.

Until there is support for mounting /boot in the initramfs, we also need to mount that, because it was marked canmount=noauto. Also, with UEFI, we need to ensure it is mounted before its child filesystem /boot/efi

rpool is guaranteed to be imported by the initramfs, so there is no point in adding x-systemd.requires=zfs-import.target to those filesystems.

First, unmount /boot/efi

umount /boot/efi

Then set the bpool/BOOT/ubuntu dataset to use legacy mounting.

zfs set mountpoint=legacy bpool/BOOT/ubuntu

Since bpool/BOOT/ubuntu is a created dataset and is now legacy mounted, we need to add it to /etc/fstab

echo bpool/BOOT/ubuntu /boot zfs nodev,relatime,x-systemd.requires=zfs-import-bpool.service 0 0 >> /etc/fstab

Some of these are not needed. It depends on what you selected to be datasets of the /var directory, if any at all were created.

In our example, we setup a dataset for /, /boot, /home, /root, /var, /var/lib, /var/log, and /var/spool. If you added more than that, please go ahead and add your own to legacy mounting and add it to /etc/fstab.

Note – the /var dataset was skipped in the ZoL wiki, so we skipped it as well. It seems that /var/log and /var/temp might be the only required datasets to be defined in /etc/fstab. This is an assumption, however.

zfs set mountpoint=legacy rpool/var/log
echo rpool/var/log /var/log zfs nodev,relatime 0 0 >> /etc/fstab

Same thing for /var/lib

zfs set mountpoint=legacy rpool/var/lib
echo rpool/var/lib /var/lib zfs nodev,relatime 0 0 >> /etc/fstab

Same thing for /var/spool

zfs set mountpoint=legacy rpool/var/spool
echo rpool/var/spool /var/spool zfs nodev,relatime 0 0 >> /etc/fstab

If you created a /var/tmp dataset (you probably didn’t)

zfs set mountpoint=legacy rpool/var/tmp
echo rpool/var/tmp /var/tmp zfs nodev,relatime 0 0 >> /etc/fstab

If you created a /tmp dataset (you probably didn’t)

zfs set mountpoint=legacy rpool/tmp
echo rpool/tmp /tmp zfs nodev,relatime 0 0 >> /etc/fstab

Lets test snapshots.

You probably will want to take snapshots before each upgrade. Also, don’t forget to remove old snapshots, this one included, at some point to save space.

This should snapshot all the nested datasets. Unless a specific dataset was created and set to not allow snapshots. That might be handy for non-user generated files.

zfs snapshot bpool/BOOT/[email protected]
zfs snapshot rpool/ROOT/[email protected]

Theoretically, everything is setup, leave chroot

exit

Unmount the hard drive.

mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {}
 zpool export -a

If no errors were given, shut down the machine.

systemctl poweroff

Once the machine is fully off, remove the Live USB drive.

When you turn on the machine, you should be able to boot into the hard drive we set up. However, we never made it this far. There needs to be further configuration needed. It won’t matter if we can’t boot to the hard drive we set up.

If someone has any input on how to proceed, or if you spot a mistake, please leave a comment below. This page might get updated if anyone has anything to add.

Why not wait for the Ubiquity installer to add ZFS compatibility?

Yes, that’s probably the smart thing to do for now. However, debootstrap was an interesting way of installing Ubuntu. Very similar to the Arch experience.

Ubuntu is creating a nice program to accompany ZFS installations. Though that program was never used in this walkthrough.

zsys – https://github.com/ubuntu/zsys

Documentation is not out yet for zsys. By the time 19.10 is officially released, we should get that documentation from Canonical.

If you want better ZFS support from your OS vendor. Don’t forget, you can use FreeBSD. :-p

We hope to see ya’ll at a future meetup!

Meetup 004 | How we almost setup ZFS on root…
Tagged on: