ZFS Resilver on SMR Drives

I was always a fan of these quite large up to 5 TB small and power efficient 2.5″ SMR drives.

This entry is kinda like a continuation of the previous ZFS on SMR Drives article.

seagate-5TB-SMR-disk

Multiple times I have heard that it is not possible to rebuild/resilver the data on ZFS pool on such drives because of SMR shortcomings.

This time we will be able to check that in the real world as one of such drives just died in my buddy’s FreeBSD based TrueNAS CORE home NAS.

FreeBSD Note

For the record – keep in mind that I will not use ANY GUI at all here – everything will be done using SSH session in CLI. That means that everything here also applies to ‘plain’ FreeBSD systems.

The only difference is that FreeNAS or TrueNAS CORE uses /dev/gptid path for ZFS devices as shown below.

root@freenas[~]# zpool status data
  pool: data
 state: ONLINE
config:

        NAME                                            STATE     READ WRITE CKSUM
        data                                            ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100  ONLINE       0     0     0
            gptid/d0131cb4-0935-11ef-8323-d050999b6100  ONLINE       0     0     0

errors: No known data errors

… and on FreeBSD its often ‘generic’ devices directly from /dev tree.

root@FreeBSD # zpool status test
  pool: native
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0

errors: No known data errors


Nothing more.

Zed is Dead

My friend used two 2.5″ SMR SATA drives in ZFS mirror – each disk in 4 TB size … and one of them just died and disappeared after several years.

root@freenas[~]# zpool status data
  pool: data
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 34.1M in 00:00:07 with 0 errors on Sun Apr 21 12:14:46 2024
config:

        NAME                                            STATE     READ WRITE CKSUM
        data                                            DEGRADED     0     0     0
          mirror-0                                      DEGRADED     0     0     0
            gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100  ONLINE       0     0     0
            gptid/4d8022fe-af42-11eb-83c1-d050999b6100  REMOVED      0     0     0

OK – so one of the drives is ‘gone’.

New Disk

We will now attach other 2.5″ SMR drive in 5 TB size as a replacement candidate – same price as 4 TB ones – but in the future when the second 4 TB drive will fall – we will upgrade to another 5 TB drive. It came with ms-reserved and ms-basic-data partitions that origin probably from some Windows system.

As it seams the 5 TB drive came with corrupted GPT header – fixed below – but we will wipe that anyway πŸ™‚

root@freenas[~]# gpart show
=>        40  7814037088  ada0  GPT  (3.6T)
          40          88        - free -  (44K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  7809842696     2  freebsd-zfs  (3.6T)

=>        34  9767541100  ada1  GPT  (4.5T) [CORRUPT]
          34      262144     1  ms-reserved  (128M)
      262178        2014        - free -  (1.0M)
      264192  9767276544     2  ms-basic-data  (4.5T)
  9767540736         398        - free -  (199K)

=>      40  62521264  da0  GPT  (30G)
        40    532480    1  efi  (260M)
    532520  61964288    2  freebsd-zfs  (30G)
  62496808     24496       - free -  (12M)

root@freenas[~]# gpart recover ada1
ada1 recovered

root@freenas[~]# gpart show        
=>        40  7814037088  ada0  GPT  (3.6T)
          40          88        - free -  (44K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  7809842696     2  freebsd-zfs  (3.6T)

=>        34  9767541094  ada1  GPT  (4.5T)
          34      262144     1  ms-reserved  (128M)
      262178        2014        - free -  (1.0M)
      264192  9767276544     2  ms-basic-data  (4.5T)
  9767540736         392        - free -  (196K)

=>      40  62521264  da0  GPT  (30G)
        40    532480    1  efi  (260M)
    532520  61964288    2  freebsd-zfs  (30G)
  62496808     24496       - free -  (12M)

For the record – the ada0 disk is the 4 TB drive that still works and hold the data on ZFS pool and the da0 disk is the system disk on which FreeNAS (later TrueNAS CORE) was installed – Lexar S47 32 GB USB tiny mini stick.

lexar-s47-32gb

… and quite fast too. I also like to use these for system disks as they do not die – which I can not say about the similar size SanDisk drives.

Wipe Clean and Copy Partition Sizes

We will now clone the partition sizes from the still working 4 TB disk to the new one.

root@freenas[~]# gpart destroy -F ada1
ada1 destroyed

root@freenas[~]# ls -l /dev/gptid
total 0
crw-r-----  1 root  operator  0x79 May  2 15:47 52fa3736-6eeb-11eb-a6ce-d050999b6100
crw-r-----  1 root  operator  0x6c May  2 15:47 7f134d8a-cc6d-11eb-92aa-d050999b6100

root@freenas[~]# gpart backup ada0 | gpart restore ada1

root@freenas[~]# gpart show
=>        40  7814037088  ada0  GPT  (3.6T)
          40          88        - free -  (44K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  7809842696     2  freebsd-zfs  (3.6T)

=>      40  62521264  da0  GPT  (30G)
        40    532480    1  efi  (260M)
    532520  61964288    2  freebsd-zfs  (30G)
  62496808     24496       - free -  (12M)

=>        40  9767541088  ada1  GPT  (4.5T)
          40          88        - free -  (44K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  7809842696     2  freebsd-zfs  (3.6T)
  7814037128  1953504000        - free -  (932G)

root@freenas[~]# ls -l /dev/gptid
total 0
crw-r-----  1 root  operator  0x79 May  2 15:47 52fa3736-6eeb-11eb-a6ce-d050999b6100
crw-r-----  1 root  operator  0x6c May  2 15:47 7f134d8a-cc6d-11eb-92aa-d050999b6100
crw-r-----  1 root  operator  0xc0 May  3 12:13 d00a564f-0935-11ef-8323-d050999b6100
crw-r-----  1 root  operator  0xc2 May  3 12:13 d0131cb4-0935-11ef-8323-d050999b6100

root@freenas[~]# gpart list ada1
Geom name: ada1
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 9767541127
first: 40
entries: 128
scheme: GPT
Providers:
1. Name: ada1p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   efimedia: HD(1,GPT,d00a564f-0935-11ef-8323-d050999b6100,0x80,0x400000)
   rawuuid: d00a564f-0935-11ef-8323-d050999b6100
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: ada1p2
   Mediasize: 3998639460352 (3.6T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   efimedia: HD(2,GPT,d0131cb4-0935-11ef-8323-d050999b6100,0x400080,0x1d180be08)
   rawuuid: d0131cb4-0935-11ef-8323-d050999b6100
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 3998639460352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 7814037127
   start: 4194432
Consumers:
1. Name: ada1
   Mediasize: 5000981078016 (4.5T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

root@freenas[~]# ls -l /dev/gptid | grep d0131cb4-0935-11ef-8323-d050999b6100
crw-r-----  1 root  operator  0xc2 May  3 12:13 d0131cb4-0935-11ef-8323-d050999b6100

Done – partition sizes are identical. ZFS requires the partition to be equal size or larger – but now we can create additional partition there for some other not mirrored less important data – like ISO images for example – or other easily redownloadable stuff.

Replace Broken Disk

We will now replace the absent/broken disk (partition actually) with the new one.

root@freenas[~]# zpool replace data \
                   gptid/4d8022fe-af42-11eb-83c1-d050999b6100 \
                   gptid/d0131cb4-0935-11ef-8323-d050999b6100

root@freenas[~]# zpool status data
  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri May  3 12:19:26 2024
        387G scanned at 2.01G/s, 21.7M issued at 115K/s, 2.78T total
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                              STATE     READ WRITE CKSUM
        data                                              DEGRADED     0     0     0
          mirror-0                                        DEGRADED     0     0     0
            gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100    ONLINE       0     0     0
            replacing-1                                   DEGRADED     0     0     0
              gptid/4d8022fe-af42-11eb-83c1-d050999b6100  REMOVED      0     0     0
              gptid/d0131cb4-0935-11ef-8323-d050999b6100  ONLINE       0     0     0

errors: No known data errors

… and the ZFS resilver process started.

Not yet marked on the right side of the new disk with (resilvering) label – but that will appear later.

This is how the process looked like somewhere in the middle of it.

root@freenas[~]# zpool status data
  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri May  3 12:19:26 2024
        2.16T scanned at 14.2M/s, 2.00T issued at 13.1M/s, 2.78T total
        2.00T resilvered, 71.92% done, 17:18:17 to go
config:

        NAME                                              STATE     READ WRITE CKSUM
        data                                              DEGRADED     0     0     0
          mirror-0                                        DEGRADED     0     0     0
            gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100    ONLINE       0     0     0
            replacing-1                                   DEGRADED     0     0     0
              gptid/4d8022fe-af42-11eb-83c1-d050999b6100  REMOVED      0     0     0
              gptid/d0131cb4-0935-11ef-8323-d050999b6100  ONLINE       0     0     0  (resilvering)

errors: No known data errors

… and after 3 days and 14 hours the resilver is done and data is mirrored and protected again.

root@freenas[~]# zpool status data
  pool: data
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 2.78T in 3 days 14:33:47 with 0 errors on Tue May  7 02:53:13 2024
config:

        NAME                                            STATE     READ WRITE CKSUM
        data                                            ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100  ONLINE       0     0     0
            gptid/d0131cb4-0935-11ef-8323-d050999b6100  ONLINE       0     0     0

errors: No known data errors

Conclusion

A used 2.5″ SMR drive with 5 TB capacity costs now about $70. That is about $14 for a TB of storage.

If You feel that almost 4 days of resilver process is way too long for your data security – remember its ZFS – just add another 5 TB disk – so You will have 3-way ZFS mirror instead of ‘classic’ RAID1 like 2-way mirror like in this article. The disks are very cheap.

This way when one disk dies You will still have your data protected by the ZFS mirror.

… and it will look like that instead.

root@freenas[~]# zpool status data
  pool: data
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 2.78T in 3 days 14:33:47 with 0 errors on Tue May  7 02:53:13 2024
config:

        NAME                                            STATE     READ WRITE CKSUM
        data                                            ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100  ONLINE       0     0     0
            gptid/df35a310-bef4-283b-47b9-d050999b6100  ONLINE       0     0     0
            gptid/d0131cb4-0935-32ef-8323-d050999b6100  ONLINE       0     0     0

errors: No known data errors
Share your resilver experiences on these cheaper SMR drives.
EOF

12 thoughts on “ZFS Resilver on SMR Drives

  1. samask

    Why not https://serverfault.com/a/1151010 ?

    There is also a special “sequential resilver” option for mirrored vdevs that can be triggered usingΒ zpool attach -sΒ orΒ zpool replace -s – this performs a faster copy of all data without any checking, and initiates a deferred scrub to verify integrity later. This is good for quickly restoring redundancy, but should only be used if you’re confident that the existing data is correct (you run regular scrubs, or scrubbed before adding/replacing).

    Liked by 1 person

    Reply
    1. vermaden Post author

      Thank You for teaching me the -s option for replace/attach command πŸ™‚

      Seems I need to keep more track on OpenZFS latest developments …

      Like

      Reply
      1. vermaden Post author

        Maybe you took my reply as sarcasm or irony – let me phrase that literally.

        I did not knew -s option and I am really glad that you showed me it exists – now I will probably be seeking either for some benchmarks for the time difference or by trying to test it myself πŸ™‚

        Keep in mind that I do not know everything – but that I know – I will gladly share.

        Like

  2. RandomlyTangentialThoughts

    IIRC the original problem with SMRs was two fold: one being that WD was underhanded in the way they switched technologies in an already established product line that caused issues. Two, the technology itself causes a latency increase when the random access cache is exhausted resulted in problems with assumptions in ZFS timeouts. The issue has been addressed since then so there really shouldn’t be a problem with the extended latency sequential spinning rust imposes. I do wonder if it would be beneficial to issue a manual TRIM command after the resilver to clean up any reordering house keeping left over. (TRIM isn’t just for SSDs, it’s also for SMR media.)

    Like

    Reply
  3. Jessie Saenz a.k.a Chunky_Pie

    My first ZFS box had nothing but SMR drives in a raizd1 and had no problems resilvering. Can’t speak on speed compared to CMR but… Thank you for the Blog post!

    Like

    Reply
  4. mdsah58

    Great write up I’ve also always been interested in SMR drives and their potential issues in ZFS. Tell me why did you mirror the partitions of your 4TB drive to your 5TB drive? This meant that 1TB was free and if/when you replace your 4TB drive with a 5 you won’t automatically get the extra space in your pool.

    Like

    Reply
  5. Pingback: Valuable News – 2024/06/03 | πšŸπšŽπš›πš–πšŠπšπšŽπš—

Leave a comment