The ZFS filesystem (more often called OpenZFS lately – as the project name) is a great filesystem for many purposes. From home or desktop/laptop solutions to enterprise offerings. Traditional disk drives have non overlapping magnetic tracks parallel to each other. These are PMR disks (Perpendicular Magnetic Recording). Hard disk drive manufacturers – to pack even more data into the same size platters – also offer SMR disks. In SMR disks data tracks are written to overlap part of previously written track – this results in narrower tracks and higher density. I will try to visualize this difference below using my favorite Enterprise Architect ASCII Edition software.
PMR SMR [xxx][___][___][___] [xx[__[__[___] [___][xxx][___][___] [__[xx[__[___] [___][___][xxx][___] [__[__[xx[___] [___][___][___][xxx] [__[__[__[xxx] [___][xxx][___][xxx] [__[xx[__[xxx] [xxx][___][___][xxx] [xx[__[__[xxx] 12345678901234567890 12345678901234
I marked the filled blocks on both disks with xxx marks. As you can compare the below ‘size’ of the taken place the same data on SMR disk takes less physical space then on traditional PMR drives. This comes at a price through. Writes are little ‘crippled’ comparing to PMR drives. Especially heavy and random I/O writes are ‘problematic’ and slower on SMR drives … but it does not mean they are useless.
For the backup or clone purposes they are more then enough. I personally use SMR drives for my backup solutions. Its just about price/performance ratio.
Here are mine backup solutions based on the SMR drives:
How ZFS behaves on SMR drives? Very well I would say. ZFS tries to pack as much random I/O into sequential with its ZFS features – described in detail in the zpool-features(7) man page for example.
I recently tried ZFS on top of GELI encrypted partition on a 5 TB external USB SMR drive. I needed to copy little more then 3 TB of data there. I used rsync(1) for that purpose. These are the arguments I use for my rsync(1) jobs.
% rsync --modify-window=1 -l -t -r -D -v -S -H --force \ --progress --no-whole-file --numeric-ids --delete \ /files/ /media/external/files/
Of course I do not write all these options by hand – I just a script wrapper for that – rsync-delete.sh – available on my scripts page.
As I started to copy files on the drive I watched the write speeds using iostat(8) and zpool-iostat(8) tools. I expected quite slow operation but even with the enabled zstd compression and AES-XTS 256bit GELI encryption I got pretty decent results.
Here are the iostat(8) results. Each line means average of 10 minutes (600 seconds). Check the speeds for da0 drive below.
% iostat 600 tty ada0 ada1 da0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 1 1 513 120 59.9 29.5 39 1.1 742 65 46.8 4 8 17 2 69 0 2 615 94 56.6 19.1 22 0.4 751 68 49.8 1 3 14 1 82 0 0 561 106 57.9 17.9 20 0.4 760 70 52.0 1 2 14 1 82 0 0 1015 57 56.8 18.4 16 0.3 769 68 50.9 1 3 15 1 81 0 0 1017 57 56.3 18.5 16 0.3 757 68 50.6 1 3 14 1 81 0 1 752 72 53.0 16.6 23 0.4 765 67 50.1 1 1 13 0 85 0 0 1014 51 50.1 16.5 21 0.3 723 68 48.3 1 1 13 0 86 0 0 1012 51 50.2 19.8 18 0.3 743 68 49.2 1 1 12 0 86
And here are the zpool-iostat(8) results.
% zpool iostat POOL 600 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- POOL 3.18T 1.37T 7 56 53.5K 40.7M POOL 3.20T 1.34T 0 57 9.01K 41.4M POOL 3.22T 1.33T 0 47 3.29K 32.3M POOL 3.24T 1.31T 0 47 5.59K 33.9M POOL 3.25T 1.29T 0 43 3.39K 24.3M POOL 3.27T 1.28T 0 42 3.01K 25.5M POOL 3.28T 1.27T 0 44 3.14K 26.8M POOL 3.29T 1.26T 0 42 3.49K 23.9M
The drive was attached over USB 3.0 port so there was not 35 MB/s limitation from USB 2.0 port. I would say that the results are very decent and consistent.
There are several settings that can help you squeeze maximum from these SMR drives on ZFS filesystem.
First are ZFS pool settings. You want the latest zstd compression to save some space. Also better compression means less physical bytes need to be written to the drive so less I/O operations. You should also turn atime into off state as it will not be needed. You should also increase recordsize to something really big like 1m (1 megabyte) so you will get higher compressratio and also will need to have less metadata for more ZFS blocks. Keep in mind that ZFS will still use variable block size and not only the 1m maximum. If something is smaller (like 100k) then it would take for example 80k (after applied zstd compression). You will not waste 920k here 🙂
Keep in mind that most newer and larger drives use 4k blocks (instead of 512b). Sometimes its 512e method which means that drive firmware will ‘present’ device with 512b blocks while underneath these eight 512k blocks just lay down on a single 4k block. For these reasons its important to keep in mind several things.
When adding new partitions with gpart(8) remember to align them to 4k with -a 4k argument.
# gpart add -t freebsd-zfs -a 4k da0
Next – when initializing the geli(8) encryption layer – make sure you add -s 4096 argument.
# geli init -s 4096 /dev/da0p1
The last thing is ZFS pool creation with proper ashift property – it can not be changed later. On FreeBSD UNIX its done that way:
# sysctl vfs.zfs.min_auto_ashift=12 # zpool create POOL da0 # zdb -C POOL | grep ashift ashift: 12
If you are curious what 12 means then below table will help you:
ASHIFT BLOCKSIZE 9 512b 10 1k 11 2k 12 4k 13 8k
Last but not least is the redundant_metadata option. By default its at all setting but its desired to set it into the most state. Do you need redundant metadata? I think not. When your single drive will fail the redundant metadata would not help and if your ZFS pool have some redundancy level like raidz or mirror then redundant metadata is also not needed because its just ‘normally’ redundant being spread across several disks.
Keep in mind that ZFS resilver process on some of these SMR drives can take forever. Some people from Reddit reported that they successfully resilvered their ZFS pools with SMR drives but that does not have to be the case for all SMR drives out there. You can also check Ars Technica tests of resilver on SMR disks.
Here is the summary of ZFS tunables suggested – you will find in depth description of all of them in the zfsprops(7) man page.
# zfs set redundant_metadata=most POOL # zfs set compression=zstd POOL # zfs set atime=off POOL # zfs set recordsize=1m POOL
In theory the TRIM operations upon deletion would create additional unwanted ‘stress’ for SMR drives which would mean that TRIM operations should be disabled for on non-SSD drives and you can disable them entirely on the ZFS pool level … but.
TRIM commands issued by the operating system allows SMR HDD internal controller to get the information that certain areas/blocks on that SMR HDD plates are no longer in use. It means that writes to such areas could be performed without slow read-modify-write pattern.
This means we are leaving the autotrim option as on (enabled) for SMR drives.
# zpool autotrim=on POOL
Also – if needed – you can manually trigger the TRIM operations with this command.
# zpool trim POOL # zpool status POOL pool: POOL state: ONLINE scan: scrub repaired 0B in 02:17:22 with 0 errors on Sun May 8 05:18:22 2022 config: NAME STATE READ WRITE CKSUM POOL ONLINE 0 0 0 da0p1.eli ONLINE 0 0 0 (trimming) errors: No known data errors
By default the TRIM commands are executed at 64 rate on FreeBSD. You can limit them to 1 and still have them enabled with following sysctl(8) tunable.
# sysctl vfs.zfs.vdev.trim_max_active=1
If you want to make it survive across reboots then put it into the /etc/sysctl.conf file.
Logic could suggest that simpler/older filesystems such as FreeBSD UFS for example could be more suitable solution for SMR drives … but the reality shows that not so much. Check this Reddit thread for example – Appalling Performance on External USB SMR Drive – to name just one.
Hope this article will help you get most of your SMR drives.