I was always a fan of these quite large up to 5 TB small and power efficient 2.5″ SMR drives.
This entry is kinda like a continuation of the previous ZFS on SMR Drives article.
Multiple times I have heard that it is not possible to rebuild/resilver the data on ZFS pool on such drives because of SMR shortcomings.
This time we will be able to check that in the real world as one of such drives just died in my buddy’s FreeBSD based TrueNAS CORE home NAS.
FreeBSD Note
For the record – keep in mind that I will not use ANY GUI at all here – everything will be done using SSH session in CLI. That means that everything here also applies to ‘plain’ FreeBSD systems.
The only difference is that FreeNAS or TrueNAS CORE uses /dev/gptid path for ZFS devices as shown below.
root@freenas[~]# zpool status data pool: data state: ONLINE config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100 ONLINE 0 0 0 gptid/d0131cb4-0935-11ef-8323-d050999b6100 ONLINE 0 0 0 errors: No known data errors
… and on FreeBSD its often ‘generic’ devices directly from /dev tree.
root@FreeBSD # zpool status test
pool: native
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
errors: No known data errors
Nothing more.
Zed is Dead
My friend used two 2.5″ SMR SATA drives in ZFS mirror – each disk in 4 TB size … and one of them just died and disappeared after several years.
root@freenas[~]# zpool status data pool: data state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using zpool online' or replace the device with 'zpool replace'. scan: resilvered 34.1M in 00:00:07 with 0 errors on Sun Apr 21 12:14:46 2024 config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100 ONLINE 0 0 0 gptid/4d8022fe-af42-11eb-83c1-d050999b6100 REMOVED 0 0 0
OK – so one of the drives is ‘gone’.
New Disk
We will now attach other 2.5″ SMR drive in 5 TB size as a replacement candidate – same price as 4 TB ones – but in the future when the second 4 TB drive will fall – we will upgrade to another 5 TB drive. It came with ms-reserved and ms-basic-data partitions that origin probably from some Windows system.
As it seams the 5 TB drive came with corrupted GPT header – fixed below – but we will wipe that anyway π
root@freenas[~]# gpart show => 40 7814037088 ada0 GPT (3.6T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 7809842696 2 freebsd-zfs (3.6T) => 34 9767541100 ada1 GPT (4.5T) [CORRUPT] 34 262144 1 ms-reserved (128M) 262178 2014 - free - (1.0M) 264192 9767276544 2 ms-basic-data (4.5T) 9767540736 398 - free - (199K) => 40 62521264 da0 GPT (30G) 40 532480 1 efi (260M) 532520 61964288 2 freebsd-zfs (30G) 62496808 24496 - free - (12M) root@freenas[~]# gpart recover ada1 ada1 recovered root@freenas[~]# gpart show => 40 7814037088 ada0 GPT (3.6T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 7809842696 2 freebsd-zfs (3.6T) => 34 9767541094 ada1 GPT (4.5T) 34 262144 1 ms-reserved (128M) 262178 2014 - free - (1.0M) 264192 9767276544 2 ms-basic-data (4.5T) 9767540736 392 - free - (196K) => 40 62521264 da0 GPT (30G) 40 532480 1 efi (260M) 532520 61964288 2 freebsd-zfs (30G) 62496808 24496 - free - (12M)
For the record – the ada0 disk is the 4 TB drive that still works and hold the data on ZFS pool and the da0 disk is the system disk on which FreeNAS (later TrueNAS CORE) was installed – Lexar S47 32 GB USB tiny mini stick.
… and quite fast too. I also like to use these for system disks as they do not die – which I can not say about the similar size SanDisk drives.
Wipe Clean and Copy Partition Sizes
We will now clone the partition sizes from the still working 4 TB disk to the new one.
root@freenas[~]# gpart destroy -F ada1 ada1 destroyed root@freenas[~]# ls -l /dev/gptid total 0 crw-r----- 1 root operator 0x79 May 2 15:47 52fa3736-6eeb-11eb-a6ce-d050999b6100 crw-r----- 1 root operator 0x6c May 2 15:47 7f134d8a-cc6d-11eb-92aa-d050999b6100 root@freenas[~]# gpart backup ada0 | gpart restore ada1 root@freenas[~]# gpart show => 40 7814037088 ada0 GPT (3.6T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 7809842696 2 freebsd-zfs (3.6T) => 40 62521264 da0 GPT (30G) 40 532480 1 efi (260M) 532520 61964288 2 freebsd-zfs (30G) 62496808 24496 - free - (12M) => 40 9767541088 ada1 GPT (4.5T) 40 88 - free - (44K) 128 4194304 1 freebsd-swap (2.0G) 4194432 7809842696 2 freebsd-zfs (3.6T) 7814037128 1953504000 - free - (932G) root@freenas[~]# ls -l /dev/gptid total 0 crw-r----- 1 root operator 0x79 May 2 15:47 52fa3736-6eeb-11eb-a6ce-d050999b6100 crw-r----- 1 root operator 0x6c May 2 15:47 7f134d8a-cc6d-11eb-92aa-d050999b6100 crw-r----- 1 root operator 0xc0 May 3 12:13 d00a564f-0935-11ef-8323-d050999b6100 crw-r----- 1 root operator 0xc2 May 3 12:13 d0131cb4-0935-11ef-8323-d050999b6100 root@freenas[~]# gpart list ada1 Geom name: ada1 modified: false state: OK fwheads: 16 fwsectors: 63 last: 9767541127 first: 40 entries: 128 scheme: GPT Providers: 1. Name: ada1p1 Mediasize: 2147483648 (2.0G) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 efimedia: HD(1,GPT,d00a564f-0935-11ef-8323-d050999b6100,0x80,0x400000) rawuuid: d00a564f-0935-11ef-8323-d050999b6100 rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b label: (null) length: 2147483648 offset: 65536 type: freebsd-swap index: 1 end: 4194431 start: 128 2. Name: ada1p2 Mediasize: 3998639460352 (3.6T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 efimedia: HD(2,GPT,d0131cb4-0935-11ef-8323-d050999b6100,0x400080,0x1d180be08) rawuuid: d0131cb4-0935-11ef-8323-d050999b6100 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 3998639460352 offset: 2147549184 type: freebsd-zfs index: 2 end: 7814037127 start: 4194432 Consumers: 1. Name: ada1 Mediasize: 5000981078016 (4.5T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 root@freenas[~]# ls -l /dev/gptid | grep d0131cb4-0935-11ef-8323-d050999b6100 crw-r----- 1 root operator 0xc2 May 3 12:13 d0131cb4-0935-11ef-8323-d050999b6100
Done – partition sizes are identical. ZFS requires the partition to be equal size or larger – but now we can create additional partition there for some other not mirrored less important data – like ISO images for example – or other easily redownloadable stuff.
Replace Broken Disk
We will now replace the absent/broken disk (partition actually) with the new one.
root@freenas[~]# zpool replace data \ gptid/4d8022fe-af42-11eb-83c1-d050999b6100 \ gptid/d0131cb4-0935-11ef-8323-d050999b6100 root@freenas[~]# zpool status data pool: data state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri May 3 12:19:26 2024 387G scanned at 2.01G/s, 21.7M issued at 115K/s, 2.78T total 0B resilvered, 0.00% done, no estimated completion time config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 gptid/4d8022fe-af42-11eb-83c1-d050999b6100 REMOVED 0 0 0 gptid/d0131cb4-0935-11ef-8323-d050999b6100 ONLINE 0 0 0 errors: No known data errors
… and the ZFS resilver process started.
Not yet marked on the right side of the new disk with (resilvering) label – but that will appear later.
This is how the process looked like somewhere in the middle of it.
root@freenas[~]# zpool status data pool: data state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri May 3 12:19:26 2024 2.16T scanned at 14.2M/s, 2.00T issued at 13.1M/s, 2.78T total 2.00T resilvered, 71.92% done, 17:18:17 to go config: NAME STATE READ WRITE CKSUM data DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 gptid/4d8022fe-af42-11eb-83c1-d050999b6100 REMOVED 0 0 0 gptid/d0131cb4-0935-11ef-8323-d050999b6100 ONLINE 0 0 0 (resilvering) errors: No known data errors
… and after 3 days and 14 hours the resilver is done and data is mirrored and protected again.
root@freenas[~]# zpool status data pool: data state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: resilvered 2.78T in 3 days 14:33:47 with 0 errors on Tue May 7 02:53:13 2024 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100 ONLINE 0 0 0 gptid/d0131cb4-0935-11ef-8323-d050999b6100 ONLINE 0 0 0 errors: No known data errors
Conclusion
A used 2.5″ SMR drive with 5 TB capacity costs now about $70. That is about $14 for a TB of storage.
If You feel that almost 4 days of resilver process is way too long for your data security – remember its ZFS – just add another 5 TB disk – so You will have 3-way ZFS mirror instead of ‘classic’ RAID1 like 2-way mirror like in this article. The disks are very cheap.
This way when one disk dies You will still have your data protected by the ZFS mirror.
… and it will look like that instead.
root@freenas[~]# zpool status data pool: data state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: resilvered 2.78T in 3 days 14:33:47 with 0 errors on Tue May 7 02:53:13 2024 config: NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/7f134d8a-cc6d-11eb-92aa-d050999b6100 ONLINE 0 0 0 gptid/df35a310-bef4-283b-47b9-d050999b6100 ONLINE 0 0 0 gptid/d0131cb4-0935-32ef-8323-d050999b6100 ONLINE 0 0 0 errors: No known data errors
Why not https://serverfault.com/a/1151010 ?
LikeLiked by 1 person
Thank You for teaching me the -s option for replace/attach command π
Seems I need to keep more track on OpenZFS latest developments …
LikeLike
I am nobody to teach Vermaden, one of my sysadmin heroes. Thank You for teaching me this sophisticated way!
LikeLiked by 1 person
Maybe you took my reply as sarcasm or irony – let me phrase that literally.
I did not knew -s option and I am really glad that you showed me it exists – now I will probably be seeking either for some benchmarks for the time difference or by trying to test it myself π
Keep in mind that I do not know everything – but that I know – I will gladly share.
LikeLike
Not at all. I heard you the first time and I was sincere I have been reading your blogs for many years, and appreciating your tools, like the famous beadm (https://forums.freebsd.org/threads/howto-freebsd-zfs-madness.31662/)
LikeLiked by 1 person
Thank You.
Such support always motivates me for more π
LikeLiked by 1 person
IIRC the original problem with SMRs was two fold: one being that WD was underhanded in the way they switched technologies in an already established product line that caused issues. Two, the technology itself causes a latency increase when the random access cache is exhausted resulted in problems with assumptions in ZFS timeouts. The issue has been addressed since then so there really shouldn’t be a problem with the extended latency sequential spinning rust imposes. I do wonder if it would be beneficial to issue a manual TRIM command after the resilver to clean up any reordering house keeping left over. (TRIM isn’t just for SSDs, it’s also for SMR media.)
LikeLike
ZFS on FreeBSD uses TRIM by default.
LikeLike
My first ZFS box had nothing but SMR drives in a raizd1 and had no problems resilvering. Can’t speak on speed compared to CMR but… Thank you for the Blog post!
LikeLike
Glad You liked it π
LikeLike
Great write up Iβve also always been interested in SMR drives and their potential issues in ZFS. Tell me why did you mirror the partitions of your 4TB drive to your 5TB drive? This meant that 1TB was free and if/when you replace your 4TB drive with a 5 you wonβt automatically get the extra space in your pool.
LikeLike
Pingback: Valuable News – 2024/06/03 | ππππππππ