Tag Archives: freenas

FreeBSD Enterprise 1 PB Storage

Today FreeBSD operating system turns 26 years old. 19 June is an International FreeBSD Day. This is why I got something special today :). How about using FreeBSD as an Enterprise Storage solution on real hardware? This where FreeBSD shines with all its storage features ZFS included.

Today I will show you how I have built so called Enterprise Storage based on FreeBSD system along with more then 1 PB (Petabyte) of raw capacity.

I have build various storage related systems based on FreeBSD:

This project is different. How much storage space can you squeeze from a single 4U system? It turns out a lot! Definitely more then 1 PB (1024 TB) of raw storage space.

Here is the (non clickable) Table of Contents.

  • Hardware
  • Management Interface
  • BIOS/UEFI
  • FreeBSD System
    • Disks Preparation
    • ZFS Pool Configuration
    • ZFS Settings
    • Network Configuration
    • FreeBSD Configuration
  • Purpose
  • Performance
    • Network Performance
    • Disk Subsystem Performance
  • FreeNAS
  • UPDATE 1 – BSD Now 305
  • UPDATE 2 – Real Life Pictures in Data Center

Hardware

There are 4U servers with 90-100 3.5″ drive slots which will allow you to pack 1260-1400 Terabytes of data (with 14 TB drives). Examples of such systems are:

I would use the first one – the TYAN FA100 for short name.

logo-tyan.png

While both GlusterFS and Minio clusters were cone on virtual hardware (or even FreeBSD Jails containers) this one uses real physical hardware.

The build has following specifications.

 2 x 10-Core Intel Xeon Silver 4114 CPU @ 2.20GHz
 4 x 32 GB RAM DDR4 (128 GB Total)
 2 x Intel SSD DC S3500 240 GB (System)
90 x Toshiba HDD MN07ACA12TE 12 TB (Data)
 2 x Broadcom SAS3008 Controller
 2 x Intel X710 DA-2 10GE Card
 2 x Power Supply

Price of the whole system is about $65 000 – drives included. Here is how it looks.

tyan-fa100-small.jpg

One thing that you will need is a rack cabinet that is 1200 mm long to fit that monster πŸ™‚

Management Interface

The so called Lights Out management interface is really nice. Its not bloated, well organized and works quite fast. you can create several separate user accounts or can connect to external user services like LDAP/AD/RADIUS for example.

n01.png

After logging in a simple Dashboard welcomes us.

n02.png

We have access to various Sensor information available with temperatures of system components.

n03

We have System Inventory information with installed hardware.

n04.png

There is separate Settings menu for various setup options.

n05.png

I know its 2019 but HTML5 only Remote Control (remote console) without need for any third party plugins like Java/Silverlight/Flash/… is very welcomed. It works very well too.

n06.png

n07.png

One is of course allowed to power on/off/cycle the box remotely.

n08.png

The Maintenance menu for BIOS updates.

n09.png

BIOS/UEFI

After booting into the BIOS/UEFI setup its possible to select from which drives to boot from. On the screenshots the two SSD drives prepared for system.

nas01.png

The BIOS/UEFI interface shows two Enclosures but its two Broadcom SAS3008 controllers. Some drive are attached via first Broadcom SAS3008 controller, the rest is attached via the second one, and they call them Enclosures instead od of controllers for some reason.

nas05.png

FreeBSD System

I have chosen latest FreeBSD 12.0-RELEASE for the purpose of this installation. Its generally very ‘default’ installation with ZFS mirror on two SSD disks. Nothing special.

logo-freebsd.jpg

The installation of course supports the ZFS Boot Environments bulletproof upgrades/changes feature.

# zpool list zroot
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zroot   220G  3.75G   216G        -         -     0%     1%  1.00x  ONLINE  -

# zpool status zroot
  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da91p4  ONLINE       0     0     0
            da11p4  ONLINE       0     0     0

errors: No known data errors

# df -g
Filesystem              1G-blocks Used  Avail Capacity  Mounted on
zroot/ROOT/default            211    2    209     1%    /
devfs                           0    0      0   100%    /dev
zroot/tmp                     209    0    209     0%    /tmp
zroot/usr/home                209    0    209     0%    /usr/home
zroot/usr/ports               210    0    209     0%    /usr/ports
zroot/usr/src                 210    0    209     0%    /usr/src
zroot/var/audit               209    0    209     0%    /var/audit
zroot/var/crash               209    0    209     0%    /var/crash
zroot/var/log                 209    0    209     0%    /var/log
zroot/var/mail                209    0    209     0%    /var/mail
zroot/var/tmp                 209    0    209     0%    /var/tmp

# beadm list
BE      Active Mountpoint  Space Created
default NR     /            2.4G 2019-05-24 13:24

Disks Preparation

From all the possible setups with 90 disks of 12 TB capacity I have chosen to go the RAID60 way – its ZFS equivalent of course. With 12 disks in each RAID6 (raidz2) group – there will be 7 such groups – we will have 84 used for the ZFS pool with 6 drives left as SPARE disks – that plays well for me. The disks distribution will look more or less like that.

DISKS  CONTENT
   12  raidz2-0
   12  raidz2-1
   12  raidz2-2
   12  raidz2-3
   12  raidz2-4
   12  raidz2-5
   12  raidz2-6
    6  spares
   90  TOTAL

Here is how FreeBSD system sees these drives by camcontrol(8) command. Sorted by attached SAS controller – scbus(4).

# camcontrol devlist | sort -k 6
(AHCI SGPIO Enclosure 1.00 0001)   at scbus2 target 0 lun 0 (pass0,ses0)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 50 lun 0 (pass1,da0)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 52 lun 0 (pass2,da1)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 54 lun 0 (pass3,da2)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 56 lun 0 (pass5,da4)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 57 lun 0 (pass6,da5)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 59 lun 0 (pass7,da6)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 60 lun 0 (pass8,da7)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 66 lun 0 (pass9,da8)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 67 lun 0 (pass10,da9)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 74 lun 0 (pass11,da10)
(ATA INTEL SSDSC2KB24 0100)        at scbus3 target 75 lun 0 (pass12,da11)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 76 lun 0 (pass13,da12)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 82 lun 0 (pass14,da13)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 83 lun 0 (pass15,da14)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 85 lun 0 (pass16,da15)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 87 lun 0 (pass17,da16)
(Tyan B7118 0500)                  at scbus3 target 88 lun 0 (pass18,ses1)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 89 lun 0 (pass19,da17)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 90 lun 0 (pass20,da18)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 91 lun 0 (pass21,da19)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 92 lun 0 (pass22,da20)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 93 lun 0 (pass23,da21)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 94 lun 0 (pass24,da22)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 95 lun 0 (pass25,da23)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 96 lun 0 (pass26,da24)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 97 lun 0 (pass27,da25)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 98 lun 0 (pass28,da26)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 99 lun 0 (pass29,da27)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 100 lun 0 (pass30,da28)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 101 lun 0 (pass31,da29)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 102 lun 0 (pass32,da30)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 103 lun 0 (pass33,da31)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 104 lun 0 (pass34,da32)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 105 lun 0 (pass35,da33)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 106 lun 0 (pass36,da34)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 107 lun 0 (pass37,da35)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 108 lun 0 (pass38,da36)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 109 lun 0 (pass39,da37)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 110 lun 0 (pass40,da38)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 48 lun 0 (pass41,da39)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 49 lun 0 (pass42,da40)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 51 lun 0 (pass43,da41)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 53 lun 0 (pass44,da42)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 55 lun 0 (da43,pass45)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 59 lun 0 (pass46,da44)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 64 lun 0 (pass47,da45)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 67 lun 0 (pass48,da46)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 68 lun 0 (pass49,da47)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 69 lun 0 (pass50,da48)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 73 lun 0 (pass51,da49)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 76 lun 0 (pass52,da50)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 77 lun 0 (pass53,da51)
(Tyan B7118 0500)                  at scbus4 target 80 lun 0 (pass54,ses2)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 81 lun 0 (pass55,da52)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 82 lun 0 (pass56,da53)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 83 lun 0 (pass57,da54)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 84 lun 0 (pass58,da55)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 85 lun 0 (pass59,da56)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 86 lun 0 (pass60,da57)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 87 lun 0 (pass61,da58)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 88 lun 0 (pass62,da59)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 89 lun 0 (da63,pass66)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 90 lun 0 (pass64,da61)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 91 lun 0 (pass65,da62)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 92 lun 0 (da60,pass63)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 94 lun 0 (pass67,da64)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 97 lun 0 (pass68,da65)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 98 lun 0 (pass69,da66)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 99 lun 0 (pass70,da67)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 100 lun 0 (pass71,da68)
(Tyan B7118 0500)                  at scbus4 target 101 lun 0 (pass72,ses3)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 102 lun 0 (pass73,da69)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 103 lun 0 (pass74,da70)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 104 lun 0 (pass75,da71)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 105 lun 0 (pass76,da72)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 106 lun 0 (pass77,da73)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 107 lun 0 (pass78,da74)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 108 lun 0 (pass79,da75)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 109 lun 0 (pass80,da76)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 110 lun 0 (pass81,da77)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 111 lun 0 (pass82,da78)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 112 lun 0 (pass83,da79)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 113 lun 0 (pass84,da80)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 114 lun 0 (pass85,da81)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 115 lun 0 (pass86,da82)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 116 lun 0 (pass87,da83)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 117 lun 0 (pass88,da84)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 118 lun 0 (pass89,da85)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 119 lun 0 (pass90,da86)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 120 lun 0 (pass91,da87)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 121 lun 0 (pass92,da88)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 122 lun 0 (pass93,da89)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 123 lun 0 (pass94,da90)
(ATA INTEL SSDSC2KB24 0100)        at scbus4 target 124 lun 0 (pass95,da91)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 125 lun 0 (da3,pass4)

One my ask how to identify which disk is which when the FAILURE will came … this is where FreeBSD’s sesutil(8) command comes handy.

# sesutil locate all off
# sesutil locate da64 on

The first sesutil(8) command disables all location lights in the enclosure. The second one turns on the identification for disk da64.

I will also make sure to NOT use the whole space of each drive. Such idea may be pointless but imagine the following situation. Five 12 TB disks failed after 3 years. You can not get the same model drives so you get other 12 TB drives, maybe even from other manufacturer.

# grep da64 /var/run/dmesg.boot
da64 at mpr1 bus 0 scbus4 target 93 lun 0
da64:  Fixed Direct Access SPC-4 SCSI device
da64: Serial Number 98G0A1EQF95G
da64: 1200.000MB/s transfers
da64: Command Queueing enabled
da64: 11444224MB (23437770752 512 byte sectors)

A single 12 TB drive has 23437770752 of 512 byte sectors which equals 12000138625024 bytes of raw capacity.

# expr 23437770752 \* 512
12000138625024

Now image that these other 12 TB drives from other manufacturer will come with 4 bytes smaller size … ZFS will not allow their usage because their size is smaller.

This is why I will use exactly 11175 GB size of each drive which is more or less 1 GB short of its total 11176 GB size.

Below is command that will do that for me for all 90 disks.

# camcontrol devlist \
    | grep TOSHIBA \
    | awk '{print $NF}' \
    | awk -F ',' '{print $2}' \
    | tr -d ')' \
    | while read DISK
      do
        gpart destroy -F                   ${DISK} 1> /dev/null 2> /dev/null
        gpart create -s GPT                ${DISK}
        gpart add -t freebsd-zfs -s 11175G ${DISK}
      done

# gpart show da64
=>         40  23437770672  da64  GPT  (11T)
           40  23435673600     1  freebsd-zfs  (11T)
  23435673640      2097072        - free -  (1.0G)


ZFS Pool Configuration

Next, we will have to create our ZFS pool, its probably the longest zpool command I have ever executed πŸ™‚

As the Toshiba 12 TB disks have 4k sectors we will need to set vfs.zfs.min_auto_ashift to 12 to force them.

# sysctl vfs.zfs.min_auto_ashift=12
vfs.zfs.min_auto_ashift: 12 -> 12

# zpool create nas02 \
    raidz2  da0p1  da1p1  da2p1  da3p1  da4p1  da5p1  da6p1  da7p1  da8p1  da9p1 da10p1 da12p1 \
    raidz2 da13p1 da14p1 da15p1 da16p1 da17p1 da18p1 da19p1 da20p1 da21p1 da22p1 da23p1 da24p1 \
    raidz2 da25p1 da26p1 da27p1 da28p1 da29p1 da30p1 da31p1 da32p1 da33p1 da34p1 da35p1 da36p1 \
    raidz2 da37p1 da38p1 da39p1 da40p1 da41p1 da42p1 da43p1 da44p1 da45p1 da46p1 da47p1 da48p1 \
    raidz2 da49p1 da50p1 da51p1 da52p1 da53p1 da54p1 da55p1 da56p1 da57p1 da58p1 da59p1 da60p1 \
    raidz2 da61p1 da62p1 da63p1 da64p1 da65p1 da66p1 da67p1 da68p1 da69p1 da70p1 da71p1 da72p1 \
    raidz2 da73p1 da74p1 da75p1 da76p1 da77p1 da78p1 da79p1 da80p1 da81p1 da82p1 da83p1 da84p1 \
    spare  da85p1 da86p1 da87p1 da88p1 da89p1 da90p1

# zpool status
  pool: nas02
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:05 with 0 errors on Fri May 31 10:26:29 2019
config:

        NAME        STATE     READ WRITE CKSUM
        nas02       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            da0p1   ONLINE       0     0     0
            da1p1   ONLINE       0     0     0
            da2p1   ONLINE       0     0     0
            da3p1   ONLINE       0     0     0
            da4p1   ONLINE       0     0     0
            da5p1   ONLINE       0     0     0
            da6p1   ONLINE       0     0     0
            da7p1   ONLINE       0     0     0
            da8p1   ONLINE       0     0     0
            da9p1   ONLINE       0     0     0
            da10p1  ONLINE       0     0     0
            da12p1  ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            da13p1  ONLINE       0     0     0
            da14p1  ONLINE       0     0     0
            da15p1  ONLINE       0     0     0
            da16p1  ONLINE       0     0     0
            da17p1  ONLINE       0     0     0
            da18p1  ONLINE       0     0     0
            da19p1  ONLINE       0     0     0
            da20p1  ONLINE       0     0     0
            da21p1  ONLINE       0     0     0
            da22p1  ONLINE       0     0     0
            da23p1  ONLINE       0     0     0
            da24p1  ONLINE       0     0     0
          raidz2-2  ONLINE       0     0     0
            da25p1  ONLINE       0     0     0
            da26p1  ONLINE       0     0     0
            da27p1  ONLINE       0     0     0
            da28p1  ONLINE       0     0     0
            da29p1  ONLINE       0     0     0
            da30p1  ONLINE       0     0     0
            da31p1  ONLINE       0     0     0
            da32p1  ONLINE       0     0     0
            da33p1  ONLINE       0     0     0
            da34p1  ONLINE       0     0     0
            da35p1  ONLINE       0     0     0
            da36p1  ONLINE       0     0     0
          raidz2-3  ONLINE       0     0     0
            da37p1  ONLINE       0     0     0
            da38p1  ONLINE       0     0     0
            da39p1  ONLINE       0     0     0
            da40p1  ONLINE       0     0     0
            da41p1  ONLINE       0     0     0
            da42p1  ONLINE       0     0     0
            da43p1  ONLINE       0     0     0
            da44p1  ONLINE       0     0     0
            da45p1  ONLINE       0     0     0
            da46p1  ONLINE       0     0     0
            da47p1  ONLINE       0     0     0
            da48p1  ONLINE       0     0     0
          raidz2-4  ONLINE       0     0     0
            da49p1  ONLINE       0     0     0
            da50p1  ONLINE       0     0     0
            da51p1  ONLINE       0     0     0
            da52p1  ONLINE       0     0     0
            da53p1  ONLINE       0     0     0
            da54p1  ONLINE       0     0     0
            da55p1  ONLINE       0     0     0
            da56p1  ONLINE       0     0     0
            da57p1  ONLINE       0     0     0
            da58p1  ONLINE       0     0     0
            da59p1  ONLINE       0     0     0
            da60p1  ONLINE       0     0     0
          raidz2-5  ONLINE       0     0     0
            da61p1  ONLINE       0     0     0
            da62p1  ONLINE       0     0     0
            da63p1  ONLINE       0     0     0
            da64p1  ONLINE       0     0     0
            da65p1  ONLINE       0     0     0
            da66p1  ONLINE       0     0     0
            da67p1  ONLINE       0     0     0
            da68p1  ONLINE       0     0     0
            da69p1  ONLINE       0     0     0
            da70p1  ONLINE       0     0     0
            da71p1  ONLINE       0     0     0
            da72p1  ONLINE       0     0     0
          raidz2-6  ONLINE       0     0     0
            da73p1  ONLINE       0     0     0
            da74p1  ONLINE       0     0     0
            da75p1  ONLINE       0     0     0
            da76p1  ONLINE       0     0     0
            da77p1  ONLINE       0     0     0
            da78p1  ONLINE       0     0     0
            da79p1  ONLINE       0     0     0
            da80p1  ONLINE       0     0     0
            da81p1  ONLINE       0     0     0
            da82p1  ONLINE       0     0     0
            da83p1  ONLINE       0     0     0
            da84p1  ONLINE       0     0     0
        spares
          da85p1    AVAIL
          da86p1    AVAIL
          da87p1    AVAIL
          da88p1    AVAIL
          da89p1    AVAIL
          da90p1    AVAIL

errors: No known data errors

# zpool list nas02
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
nas02   915T  1.42M   915T        -         -     0%     0%  1.00x  ONLINE  -

# zfs list nas02
NAME    USED  AVAIL  REFER  MOUNTPOINT
nas02    88K   675T   201K  none

ZFS Settings

As the primary role of this storage would be keeping files I will use one of the largest values for recordsize – 1 MB – this helps getting better compression ratio.

… but it will also serve as iSCSI Target in which we will try to fit in the native 4k blocks – thus 4096 bytes setting for iSCSI.

# zfs set compression=lz4         nas02
# zfs set atime=off               nas02
# zfs set mountpoint=none         nas02
# zfs set recordsize=1m           nas02
# zfs set redundant_metadata=most nas02
# zfs create                      nas02/nfs
# zfs create                      nas02/smb
# zfs create                      nas02/iscsi
# zfs set recordsize=4k           nas02/iscsi

Also one word on redundant_metadata as its not that obvious parameter. To quote the zfs(8) man page.

# man zfs
(...)
redundant_metadata=all | most
  Controls what types of metadata are stored redundantly.  ZFS stores
  an extra copy of metadata, so that if a single block is corrupted,
  the amount of user data lost is limited.  This extra copy is in
  addition to any redundancy provided at the pool level (e.g. by
  mirroring or RAID-Z), and is in addition to an extra copy specified
  by the copies property (up to a total of 3 copies).  For example if
  the pool is mirrored, copies=2, and redundant_metadata=most, then ZFS
  stores 6 copies of most metadata, and 4 copies of data and some
  metadata.

  When set to all, ZFS stores an extra copy of all metadata.  If a
  single on-disk block is corrupt, at worst a single block of user data
  (which is recordsize bytes long can be lost.)

  When set to most, ZFS stores an extra copy of most types of metadata.
  This can improve performance of random writes, because less metadata
  must be written.  In practice, at worst about 100 blocks (of
  recordsize bytes each) of user data can be lost if a single on-disk
  block is corrupt.  The exact behavior of which metadata blocks are
  stored redundantly may change in future releases.

  The default value is all.
(...)

From the description above we can see that its mostly useful on single device pools because when we have redundancy based on RAIDZ2 (RAID6 equivalent) we do not need to keep additional redundant copies of metadata. This helps to increase write performance.

For the record – iSCSI ZFS zvols are create with command like that one below – as sparse files – also called Thin Provisioning mode.

# zfs create -s -V 16T nas02/iscsi/test

As we have SPARE disks we will also need to enable the zfsd(8) daemon by adding zfsd_enable=YES to the /etc/rc.conf file.

We also need to enable autoreplace property for our pool because by default its set to off.

# zpool get autoreplace nas02
NAME   PROPERTY     VALUE    SOURCE
nas02  autoreplace  off      default

# zpool set autoreplace=on nas02

# zpool get autoreplace nas02
NAME   PROPERTY     VALUE    SOURCE
nas02  autoreplace  on       local

Other ZFS settings are in the /boot/loader.conf file. As this system has 128 GB RAM we will let ZFS use 50 to 75% of that amount for ARC.

# grep vfs.zfs /boot/loader.conf
  vfs.zfs.prefetch_disable=1
  vfs.zfs.cache_flush_disable=1
  vfs.zfs.vdev.cache.size=16M
  vfs.zfs.arc_min=64G
  vfs.zfs.arc_max=96G
  vfs.zfs.deadman_enabled=0

Network Configuration

This is what I really like about FreeBSD. To setup LACP link aggregation tou just need 5 lines in /etc/rc.conf file. On Red Hat Enterprise Linux you would need several files with many lines each.

# head -5 /etc/rc.conf
  defaultrouter="10.20.30.254"
  ifconfig_ixl0="up"
  ifconfig_ixl1="up"
  cloned_interfaces="lagg0"
  ifconfig_lagg0="laggproto lacp laggport ixl0 laggport ixl1 10.20.30.2/24 up"

# ifconfig lagg0
lagg0: flags=8843 metric 0 mtu 1500
        options=e507bb
        ether a0:42:3f:a0:42:3f
        inet 10.20.30.2 netmask 0xffffff00 broadcast 10.20.30.255
        laggproto lacp lagghash l2,l3,l4
        laggport: ixl0 flags=1c
        laggport: ixl1 flags=1c
        groups: lagg
        media: Ethernet autoselect
        status: active
        nd6 options=29

The Intel X710 DA-2 10GE network adapter is fully supported under FreeBSD by the ixl(4) driver.

intel-x710-da-2.jpg

Cisco Nexus Configuration

This is the Cisco Nexus configuration needed to enable LACP aggregation.

First the ports.

NEXUS-1  Eth1/32  NAS02_IXL0  connected 3  full  a-10G  SFP-H10GB-A
NEXUS-2  Eth1/32  NAS02_IXL1  connected 3  full  a-10G  SFP-H10GB-A

… and now aggregation.

interface Ethernet1/32
  description NAS02_IXL1
  switchport
  switchport access vlan 3
  mtu 9216
  channel-group 128 mode active
  no shutdown
!
interface port-channel128
  description NAS02
  switchport
  switchport access vlan 3
  mtu 9216
  vpc 128

… and the same/similar on the second Cisco Nexus NEXUS-2 switch.

FreeBSD Configuration

These are three most important configuration files on any FreeBSD system.

I will now post all settings I use on this storage system.

The /etc/rc.conf file.

# cat /etc/rc.conf
# NETWORK
  hostname="nas02.local"
  defaultrouter="10.20.30.254"
  ifconfig_ixl0="up"
  ifconfig_ixl1="up"
  cloned_interfaces="lagg0"
  ifconfig_lagg0="laggproto lacp laggport ixl0 laggport ixl1 10.20.30.2/24 up"

# KERNEL MODULES
  kld_list="${kld_list} aesni"

# DAEMON | YES
  zfs_enable=YES
  zfsd_enable=YES
  sshd_enable=YES
  ctld_enable=YES
  powerd_enable=YES

# DAEMON | NFS SERVER
  nfs_server_enable=YES
  nfs_client_enable=YES
  rpc_lockd_enable=YES
  rpc_statd_enable=YES
  rpcbind_enable=YES
  mountd_enable=YES
  mountd_flags="-r"

# OTHER
  dumpdev=NO

The /boot/loader.conf file.

# cat /boot/loader.conf
# BOOT OPTIONS
  autoboot_delay=3
  kern.geom.label.disk_ident.enable=0
  kern.geom.label.gptid.enable=0

# DISABLE INTEL HT
  machdep.hyperthreading_allowed=0

# UPDATE INTEL CPU MICROCODE AT BOOT BEFORE KERNEL IS LOADED
  cpu_microcode_load=YES
  cpu_microcode_name=/boot/firmware/intel-ucode.bin

# MODULES
  zfs_load=YES
  aio_load=YES

# RACCT/RCTL RESOURCE LIMITS
  kern.racct.enable=1

# DISABLE MEMORY TEST @ BOOT
  hw.memtest.tests=0

# PIPE KVA LIMIT | 320 MB
  kern.ipc.maxpipekva=335544320

# IPC
  kern.ipc.shmseg=1024
  kern.ipc.shmmni=1024
  kern.ipc.shmseg=1024
  kern.ipc.semmns=512
  kern.ipc.semmnu=256
  kern.ipc.semume=256
  kern.ipc.semopm=256
  kern.ipc.semmsl=512

# LARGE PAGE MAPPINGS
  vm.pmap.pg_ps_enabled=1

# ZFS TUNING
  vfs.zfs.prefetch_disable=1
  vfs.zfs.cache_flush_disable=1
  vfs.zfs.vdev.cache.size=16M
  vfs.zfs.arc_min=64G
  vfs.zfs.arc_max=96G

# ZFS DISABLE PANIC ON STALE I/O
  vfs.zfs.deadman_enabled=0

# NEWCONS SUSPEND
  kern.vt.suspendswitch=0

The /etc/sysctl.conf file.

# cat /etc/sysctl.conf
# ZFS ASHIFT
  vfs.zfs.min_auto_ashift=12

# SECURITY
  security.bsd.stack_guard_page=1

# SECURITY INTEL MDS (MICROARCHITECTURAL DATA SAMPLING) MITIGATION
  hw.mds_disable=3

# DISABLE ANNOYING THINGS
  kern.coredump=0
  hw.syscons.bell=0

# IPC
  kern.ipc.shmmax=4294967296
  kern.ipc.shmall=2097152
  kern.ipc.somaxconn=4096
  kern.ipc.maxsockbuf=5242880
  kern.ipc.shm_allow_removed=1

# NETWORK
  kern.ipc.maxsockbuf=16777216
  kern.ipc.soacceptqueue=1024
  net.inet.tcp.recvbuf_max=8388608
  net.inet.tcp.sendbuf_max=8388608
  net.inet.tcp.mssdflt=1460
  net.inet.tcp.minmss=1300
  net.inet.tcp.syncache.rexmtlimit=0
  net.inet.tcp.syncookies=0
  net.inet.tcp.tso=0
  net.inet.ip.process_options=0
  net.inet.ip.random_id=1
  net.inet.ip.redirect=0
  net.inet.icmp.drop_redirect=1
  net.inet.tcp.always_keepalive=0
  net.inet.tcp.drop_synfin=1
  net.inet.tcp.fast_finwait2_recycle=1
  net.inet.tcp.icmp_may_rst=0
  net.inet.tcp.msl=8192
  net.inet.tcp.path_mtu_discovery=0
  net.inet.udp.blackhole=1
  net.inet.tcp.blackhole=2
  net.inet.tcp.hostcache.expire=7200
  net.inet.tcp.delacktime=20

Purpose

Why one would built such appliance? Because its a lot cheaper then to get the ‘branded’ one. Think about Dell EMC Data Domain for example – and not just ‘any’ Data Domain but almost the highest one – the Data Domain DD9300 at least. It would cost about ten times more at least … with smaller capacity and taking not 4U but closer to 14U with three DS60 expanders.

But you can actually make this FreeBSD Enterprise Storage behave like Dell EMC Data Domain .. or like their Dell EMC Elastic Cloud Storage for example.

The Dell EMC CloudBoost can be deployed somewhere on your VMware stack to provide the DDBoost deduplication. Then you would need OpenStack Swift as its one of the supported backed devices.

emc-cloudboost-swift-cover.png

emc-cloudboost-swift-support.png

The OpenStack Swift package in FreeBSD is about 4-5 years behind reality (2.2.2) so you will have to use Bhyve here.

# pkg search swift
(...)
py27-swift-2.2.2_1             Highly available, distributed, eventually consistent object/blob store
(...)

Create Bhyve virtual machine on this FreeBSD Enterprise Storage with CentOS 7.6 system for example, then setup Swift there, but it will work. With 20 physical cores to spare and 128 GB RAM you would not even noticed its there.

This way you can use Dell EMC Networker with more then ten times cheaper storage.

In the past I also wrote about IBM Spectrum Protect (TSM) which would also greatly benefit from FreeBSD Enterprise Storage. I actually also use this FreeBSD based storage as space for IBM Spectrum Protect (TSM) container pool directories. Exported via iSCSI works like a charm.

You can also compare that FreeBSD Enterprise Storage to other storage appliances like iXsystems TrueNAS or EXAGRID.

Performance

You for sure would want to know how fast this FreeBSD Enterprise Storage performs πŸ™‚

I will share all performance data that I gathered with a pleasure.

Network Performance

First the network performance.

I user iperf3 as the benchmark.

I started the server on the FreeBSD side.

# iperf3 -s

… and then I started client on the Windows Server 2016 machine.

C:\iperf-3.1.3-win64>iperf3.exe -c nas02 -P 8
(...)
[SUM]   0.00-10.00  sec  10.8 GBytes  9.26 Gbits/sec                  receiver
(..)

This is with MTU 1500 – no Jumbo frames unfortunatelly 😦

Unfortunatelly this system has only one physical 10GE interface but I did other test also. Using two such boxes with single 10GE interface. That saturated the dual 10GE LACP on FreeBSD side nicely.

I also exported NFS and iSCSI to Red Hat Enterprise Linux system. The network performance was about 500-600 MB/s on single 10GE interface. That would be 1000-1200 MB/s on LACP aggregation.

Disk Subsystem Performance

Now the disk subsystem.

First some naive test using diskinfo(8) FreeBSD’s builtin tool.

# diskinfo -ctv /dev/da12
/dev/da12
        512             # sectorsize
        12000138625024  # mediasize in bytes (11T)
        23437770752     # mediasize in sectors
        4096            # stripesize
        0               # stripeoffset
        1458933         # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
        ATA TOSHIBA MG07ACA1    # Disk descr.
        98H0A11KF95G    # Disk ident.
        id1,enc@n500e081010445dbd/type@0/slot@c/elmdesc@ArrayDevice11   # Physical path
        No              # TRIM/UNMAP support
        7200            # Rotation rate in RPM
        Not_Zoned       # Zone Mode

I/O command overhead:
        time to read 10MB block      0.067031 sec       =    0.003 msec/sector
        time to read 20480 sectors   2.619989 sec       =    0.128 msec/sector
        calculated command overhead                     =    0.125 msec/sector

Seek times:
        Full stroke:      250 iter in   5.665880 sec =   22.664 msec
        Half stroke:      250 iter in   4.263047 sec =   17.052 msec
        Quarter stroke:   500 iter in   6.867914 sec =   13.736 msec
        Short forward:    400 iter in   3.057913 sec =    7.645 msec
        Short backward:   400 iter in   1.979287 sec =    4.948 msec
        Seq outer:       2048 iter in   0.169472 sec =    0.083 msec
        Seq inner:       2048 iter in   0.469630 sec =    0.229 msec

Transfer rates:
        outside:       102400 kbytes in   0.478251 sec =   214114 kbytes/sec
        middle:        102400 kbytes in   0.605701 sec =   169060 kbytes/sec
        inside:        102400 kbytes in   1.303909 sec =    78533 kbytes/sec

So now we know how fast a single disk is.

Let’s repeast the same test on the ZFS zvol device.

# diskinfo -ctv /dev/zvol/nas02/iscsi/test
/dev/zvol/nas02/iscsi/test
        512             # sectorsize
        17592186044416  # mediasize in bytes (16T)
        34359738368     # mediasize in sectors
        65536           # stripesize
        0               # stripeoffset
        Yes             # TRIM/UNMAP support
        Unknown         # Rotation rate in RPM

I/O command overhead:
        time to read 10MB block      0.004512 sec       =    0.000 msec/sector
        time to read 20480 sectors   0.196824 sec       =    0.010 msec/sector
        calculated command overhead                     =    0.009 msec/sector

Seek times:
        Full stroke:      250 iter in   0.006151 sec =    0.025 msec
        Half stroke:      250 iter in   0.008228 sec =    0.033 msec
        Quarter stroke:   500 iter in   0.014062 sec =    0.028 msec
        Short forward:    400 iter in   0.010564 sec =    0.026 msec
        Short backward:   400 iter in   0.011725 sec =    0.029 msec
        Seq outer:       2048 iter in   0.028198 sec =    0.014 msec
        Seq inner:       2048 iter in   0.028416 sec =    0.014 msec

Transfer rates:
        outside:       102400 kbytes in   0.036938 sec =  2772213 kbytes/sec
        middle:        102400 kbytes in   0.043076 sec =  2377194 kbytes/sec
        inside:        102400 kbytes in   0.034260 sec =  2988908 kbytes/sec

Almost 3 GB/s – not bad.

Time for even more oldschool test – the immortal dd(8) command.

This is with compression=off setting.

One process.

# dd  FILE bs=128m status=progress
26172456960 bytes (26 GB, 24 GiB) transferred 16.074s, 1628 MB/s
202+0 records in
201+0 records out
26977763328 bytes transferred in 16.660884 secs (1619227644 bytes/sec)

Four concurrent processes.

# dd  FILE${X} bs=128m status=progress
80933289984 bytes (81 GB, 75 GiB) transferred 98.081s, 825 MB/s
608+0 records in
608+0 records out
81604378624 bytes transferred in 98.990579 secs (824365101 bytes/sec)

Eight concurrent processes.

# dd  FILE${X} bs=128m status=progress
174214610944 bytes (174 GB, 162 GiB) transferred 385.042s, 452 MB/s
1302+0 records in
1301+0 records out
174617264128 bytes transferred in 385.379296 secs (453104943 bytes/sec)

Lets summarize that data.

1 STREAM(s) ~ 1600 MB/s ~ 1.5 GB/s
4 STREAM(s) ~ 3300 MB/s ~ 3.2 GB/s
8 STREAM(s) ~ 3600 MB/s ~ 3.5 GB/s

So the disk subsystem is able to squeeze 3.5 GB/s of sustained speed in sequential writes. That us that if we would want to saturate it we would need to add additional two 10GE interfaces.

The disks were stressed only to about 55% which you can see in other useful FreeBSD tool – gstat(8) command.

n10.png

Time for more ‘intelligent’ tests. The blogbench test.

First with compression disabled.

# time blogbench -d .
Frequency = 10 secs
Scratch dir = [.]
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 30 iterations.
The test will run during 5 minutes.
(...)
Final score for writes:          6476
Final score for reads :        660436

blogbench -d .  280.58s user 4974.41s system 1748% cpu 5:00.54 total

Second with compression set to LZ4.

# time blogbench -d .
Frequency = 10 secs
Scratch dir = [.]
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 30 iterations.
The test will run during 5 minutes.
(...)
Final score for writes:          7087
Final score for reads :        733932

blogbench -d .  299.08s user 5415.04s system 1900% cpu 5:00.68 total

Compression did not helped much, but helped.

To have some comparision we will run the same test on the system ZFS pool – two Intel SSD DC S3500 240 GB drives in mirror which have following features.

The Intel SSD DC S3500 240 GB drives:

  • Sequential Read (up to) 500 MB/s
  • Sequential Write (up to) 260 MB/s
  • Random Read (100% Span) 75000 IOPS
  • Random Write (100% Span) 7500 IOPS
# time blogbench -d .
Frequency = 10 secs
Scratch dir = [.]
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 30 iterations.
The test will run during 5 minutes.
(...)
Final score for writes:          6109
Final score for reads :        654099

blogbench -d .  278.73s user 5058.75s system 1777% cpu 5:00.30 total

Now the randomio test. Its multithreaded disk I/O microbenchmark.

The usage is as follows.

usage: randomio filename nr_threads write_fraction_of_io fsync_fraction_of_writes io_size nr_seconds_between_samples

filename                    Filename or device to read/write.
write_fraction_of_io        What fraction of I/O should be writes - for example 0.25 for 25% write.
fsync_fraction_of_writes    What fraction of writes should be fsync'd.
io_size                     How many bytes to read/write (multiple of 512 bytes).
nr_seconds_between_samples  How many seconds to average samples over.

The randomio with 4k block.

# zfs create -s -V 1T nas02/iscsi/test
# randomio /dev/zvol/nas02/iscsi/test 8 0.25 1 4096 10
  total |  read:         latency (ms)       |  write:        latency (ms)
   iops |   iops   min    avg    max   sdev |   iops   min    avg    max   sdev
--------+-----------------------------------+----------------------------------
54137.7 |40648.4   0.0    0.1  575.8    2.2 |13489.4   0.0    0.3  405.8    2.6
66248.4 |49641.5   0.0    0.1   19.6    0.3 |16606.9   0.0    0.2   26.4    0.7
66411.0 |49817.2   0.0    0.1   19.7    0.3 |16593.8   0.0    0.2   20.3    0.7
64158.9 |48142.8   0.0    0.1  254.7    0.7 |16016.1   0.0    0.2  130.4    1.0
48454.1 |36390.8   0.0    0.1  542.8    2.7 |12063.3   0.0    0.3  507.5    3.2
66796.1 |50067.4   0.0    0.1   24.1    0.3 |16728.7   0.0    0.2   23.4    0.7
58512.2 |43851.7   0.0    0.1  576.5    1.7 |14660.5   0.0    0.2  307.2    1.7
63195.8 |47341.8   0.0    0.1  261.6    0.9 |15854.1   0.0    0.2  361.1    1.9
67086.0 |50335.6   0.0    0.1   20.4    0.3 |16750.4   0.0    0.2   25.1    0.8
67429.8 |50549.6   0.0    0.1   21.8    0.3 |16880.3   0.0    0.2   20.6    0.7
^C

… and with 512 sector.

# zfs create -s -V 1T nas02/iscsi/test
# randomio /dev/zvol/nas02/iscsi/TEST 8 0.25 1 512 10
  total |  read:         latency (ms)       |  write:        latency (ms)
   iops |   iops   min    avg    max   sdev |   iops   min    avg    max   sdev
--------+-----------------------------------+----------------------------------
58218.9 |43712.0   0.0    0.1  501.5    2.1 |14506.9   0.0    0.2  272.5    1.6
66325.3 |49703.8   0.0    0.1  352.0    0.9 |16621.4   0.0    0.2  352.0    1.5
68130.5 |51100.8   0.0    0.1   24.6    0.3 |17029.7   0.0    0.2   24.4    0.7
68465.3 |51352.3   0.0    0.1   19.9    0.3 |17112.9   0.0    0.2   23.8    0.7
54903.5 |41249.1   0.0    0.1  399.3    1.9 |13654.4   0.0    0.3  335.8    2.2
61259.8 |45898.7   0.0    0.1  574.6    1.7 |15361.0   0.0    0.2  371.5    1.7
68483.3 |51313.1   0.0    0.1   22.9    0.3 |17170.3   0.0    0.2   26.1    0.7
56713.7 |42524.7   0.0    0.1  373.5    1.8 |14189.1   0.0    0.2  438.5    2.7
68861.4 |51657.0   0.0    0.1   21.0    0.3 |17204.3   0.0    0.2   21.7    0.7
68602.0 |51438.4   0.0    0.1   19.5    0.3 |17163.7   0.0    0.2   23.7    0.7
^C

Both randomio tests were run with compression set to LZ4.

Next is bonnie++ benchmark. It has been run with compression set to LZ4.

# bonnie++ -d . -u root
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
nas02.local 261368M   139  99 775132  99 589190  99   383  99 1638929  99 12930 2046
Latency             60266us    7030us    7059us   21553us    3844us    5710us
Version  1.97       ------Sequential Create------ --------Random Create--------
nas02.local         -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ 12680  44 +++++ +++ +++++ +++ 30049  99
Latency              2619us      43us     714ms    2748us      28us      58us

… and last but not least the fio benchmark. Also with LZ4 compression enabled.

# fio --randrepeat=1 --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.13
Starting 1 process
Jobs: 1 (f=1): [m(1)][98.0%][r=38.0MiB/s,w=12.2MiB/s][r=9735,w=3128 IOPS][eta 00m:05s]
test: (groupid=0, jobs=1): err= 0: pid=35368: Tue Jun 18 15:14:44 2019
  read: IOPS=3157, BW=12.3MiB/s (12.9MB/s)(3070MiB/248872msec)
   bw (  KiB/s): min= 9404, max=57732, per=98.72%, avg=12469.84, stdev=3082.99, samples=497
   iops        : min= 2351, max=14433, avg=3117.15, stdev=770.74, samples=497
  write: IOPS=1055, BW=4222KiB/s (4323kB/s)(1026MiB/248872msec)
   bw (  KiB/s): min= 3179, max=18914, per=98.71%, avg=4166.60, stdev=999.23, samples=497
   iops        : min=  794, max= 4728, avg=1041.25, stdev=249.76, samples=497
  cpu          : usr=1.11%, sys=88.64%, ctx=677981, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=12.3MiB/s (12.9MB/s), 12.3MiB/s-12.3MiB/s (12.9MB/s-12.9MB/s), io=3070MiB (3219MB), run=248872-248872msec
  WRITE: bw=4222KiB/s (4323kB/s), 4222KiB/s-4222KiB/s (4323kB/s-4323kB/s), io=1026MiB (1076MB), run=248872-248872msec

Dunno how about you but I am satisfied with performance πŸ™‚

FreeNAS

Originally I really wanted to use FreeNAS on these boxes and I even installed FreeNAS on them. It run nicely but … the security part of FreeNAS was not best.

This is the output of pkg audit command. Quite scarry.

root@freenas[~]# pkg audit -F
Fetching vuln.xml.bz2: 100%  785 KiB 804.3kB/s    00:01
python27-2.7.15 is vulnerable:
Python -- NULL pointer dereference vulnerability
CVE: CVE-2019-5010
WWW: https://vuxml.FreeBSD.org/freebsd/d74371d2-4fee-11e9-a5cd-1df8a848de3d.html

curl-7.62.0 is vulnerable:
curl -- multiple vulnerabilities
CVE: CVE-2019-3823
CVE: CVE-2019-3822
CVE: CVE-2018-16890
WWW: https://vuxml.FreeBSD.org/freebsd/714b033a-2b09-11e9-8bc3-610fd6e6cd05.html

libgcrypt-1.8.2 is vulnerable:
libgcrypt -- side-channel attack vulnerability
CVE: CVE-2018-0495
WWW: https://vuxml.FreeBSD.org/freebsd/9b5162de-6f39-11e8-818e-e8e0b747a45a.html

python36-3.6.5_1 is vulnerable:
Python -- NULL pointer dereference vulnerability
CVE: CVE-2019-5010
WWW: https://vuxml.FreeBSD.org/freebsd/d74371d2-4fee-11e9-a5cd-1df8a848de3d.html

pango-1.42.0 is vulnerable:
pango -- remote DoS vulnerability
CVE: CVE-2018-15120
WWW: https://vuxml.FreeBSD.org/freebsd/5a757a31-f98e-4bd4-8a85-f1c0f3409769.html

py36-requests-2.18.4 is vulnerable:
www/py-requests -- Information disclosure vulnerability
WWW: https://vuxml.FreeBSD.org/freebsd/50ad9a9a-1e28-11e9-98d7-0050562a4d7b.html

libnghttp2-1.31.0 is vulnerable:
nghttp2 -- Denial of service due to NULL pointer dereference
CVE: CVE-2018-1000168
WWW: https://vuxml.FreeBSD.org/freebsd/1fccb25e-8451-438c-a2b9-6a021e4d7a31.html

gnupg-2.2.6 is vulnerable:
gnupg -- unsanitized output (CVE-2018-12020)
CVE: CVE-2017-7526
CVE: CVE-2018-12020
WWW: https://vuxml.FreeBSD.org/freebsd/7da0417f-6b24-11e8-84cc-002590acae31.html

py36-cryptography-2.1.4 is vulnerable:
py-cryptography -- tag forgery vulnerability
CVE: CVE-2018-10903
WWW: https://vuxml.FreeBSD.org/freebsd/9e2d0dcf-9926-11e8-a92d-0050562a4d7b.html

perl5-5.26.1 is vulnerable:
perl -- multiple vulnerabilities
CVE: CVE-2018-6913
CVE: CVE-2018-6798
CVE: CVE-2018-6797
WWW: https://vuxml.FreeBSD.org/freebsd/41c96ffd-29a6-4dcc-9a88-65f5038fa6eb.html

libssh2-1.8.0,3 is vulnerable:
libssh2 -- multiple issues
CVE: CVE-2019-3862
CVE: CVE-2019-3861
CVE: CVE-2019-3860
CVE: CVE-2019-3858
WWW: https://vuxml.FreeBSD.org/freebsd/6e58e1e9-2636-413e-9f84-4c0e21143628.html

git-lite-2.17.0 is vulnerable:
Git -- Fix memory out-of-bounds and remote code execution vulnerabilities (CVE-2018-11233 and CVE-2018-11235)
CVE: CVE-2018-11235
CVE: CVE-2018-11233
WWW: https://vuxml.FreeBSD.org/freebsd/c7a135f4-66a4-11e8-9e63-3085a9a47796.html

gnutls-3.5.18 is vulnerable:
GnuTLS -- double free, invalid pointer access
CVE: CVE-2019-3836
CVE: CVE-2019-3829
WWW: https://vuxml.FreeBSD.org/freebsd/fb30db8f-62af-11e9-b0de-001cc0382b2f.html

13 problem(s) in the installed packages found.

root@freenas[~]# uname -a
FreeBSD freenas.local 11.2-STABLE FreeBSD 11.2-STABLE #0 r325575+95cc58ca2a0(HEAD): Mon May  6 19:08:58 EDT 2019     root@mp20.tn.ixsystems.com:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/FreeNAS.amd64  amd64

root@freenas[~]# freebsd-version -uk
11.2-STABLE
11.2-STABLE

root@freenas[~]# sockstat -l4
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
root     uwsgi-3.6  4006  3  tcp4   127.0.0.1:9042        *:*
root     uwsgi-3.6  3188  3  tcp4   127.0.0.1:9042        *:*
nobody   mdnsd      3144  4  udp4   *:31417               *:*
nobody   mdnsd      3144  6  udp4   *:5353                *:*
www      nginx      3132  6  tcp4   *:443                 *:*
www      nginx      3132  8  tcp4   *:80                  *:*
root     nginx      3131  6  tcp4   *:443                 *:*
root     nginx      3131  8  tcp4   *:80                  *:*
root     ntpd       2823  21 udp4   *:123                 *:*
root     ntpd       2823  22 udp4   10.49.13.99:123       *:*
root     ntpd       2823  25 udp4   127.0.0.1:123         *:*
root     sshd       2743  5  tcp4   *:22                  *:*
root     syslog-ng  2341  19 udp4   *:1031                *:*
nobody   mdnsd      2134  3  udp4   *:39020               *:*
nobody   mdnsd      2134  5  udp4   *:5353                *:*
root     python3.6  236   22 tcp4   *:6000                *:*


I even tried to get explanation why FreeNAS has such outdated and insecure packages in their latest version – FreeNAS 11.2-U3 Vulnerabilities – a thread I started on their forums.

Unfortunatelly its their policy which you can summarize as ‘do not touch/change versions if its working’ – at last I got this implression.

Because if these security holes I can not recommend the use of FreeNAS and I movedto original – the FreeBSD system.

One other interesting note. After I installed FreeBSD I wanted to import the ZFS pool created by FreeNAS. This is what I got after executing the zpool import command.

# zpool import
   pool: nas02_gr06
     id: 1275660523517109367
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr06  ONLINE
          raidz2-0  ONLINE
            da58p2  ONLINE
            da59p2  ONLINE
            da60p2  ONLINE
            da61p2  ONLINE
            da62p2  ONLINE
            da63p2  ONLINE
            da64p2  ONLINE
            da26p2  ONLINE
            da65p2  ONLINE
            da23p2  ONLINE
            da29p2  ONLINE
            da66p2  ONLINE
            da67p2  ONLINE
            da68p2  ONLINE
        spares
          da69p2

   pool: nas02_gr05
     id: 5642709896812665361
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr05  ONLINE
          raidz2-0  ONLINE
            da20p2  ONLINE
            da30p2  ONLINE
            da34p2  ONLINE
            da50p2  ONLINE
            da28p2  ONLINE
            da38p2  ONLINE
            da51p2  ONLINE
            da52p2  ONLINE
            da27p2  ONLINE
            da32p2  ONLINE
            da53p2  ONLINE
            da54p2  ONLINE
            da55p2  ONLINE
            da56p2  ONLINE
        spares
          da57p2

   pool: nas02_gr04
     id: 2460983830075205166
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr04  ONLINE
          raidz2-0  ONLINE
            da44p2  ONLINE
            da37p2  ONLINE
            da18p2  ONLINE
            da36p2  ONLINE
            da45p2  ONLINE
            da19p2  ONLINE
            da22p2  ONLINE
            da33p2  ONLINE
            da35p2  ONLINE
            da21p2  ONLINE
            da31p2  ONLINE
            da47p2  ONLINE
            da48p2  ONLINE
            da49p2  ONLINE
        spares
          da46p2

   pool: nas02_gr03
     id: 4878868173820164207
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr03  ONLINE
          raidz2-0  ONLINE
            da81p2  ONLINE
            da71p2  ONLINE
            da14p2  ONLINE
            da15p2  ONLINE
            da80p2  ONLINE
            da16p2  ONLINE
            da88p2  ONLINE
            da17p2  ONLINE
            da40p2  ONLINE
            da41p2  ONLINE
            da25p2  ONLINE
            da42p2  ONLINE
            da24p2  ONLINE
            da43p2  ONLINE
        spares
          da39p2

   pool: nas02_gr02
     id: 3299037437134217744
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr02  ONLINE
          raidz2-0  ONLINE
            da84p2  ONLINE
            da76p2  ONLINE
            da85p2  ONLINE
            da8p2   ONLINE
            da9p2   ONLINE
            da78p2  ONLINE
            da73p2  ONLINE
            da74p2  ONLINE
            da70p2  ONLINE
            da77p2  ONLINE
            da11p2  ONLINE
            da13p2  ONLINE
            da79p2  ONLINE
            da89p2  ONLINE
        spares
          da90p2

   pool: nas02_gr01
     id: 1132383125952900182
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr01  ONLINE
          raidz2-0  ONLINE
            da91p2  ONLINE
            da75p2  ONLINE
            da0p2   ONLINE
            da82p2  ONLINE
            da1p2   ONLINE
            da83p2  ONLINE
            da2p2   ONLINE
            da3p2   ONLINE
            da4p2   ONLINE
            da5p2   ONLINE
            da86p2  ONLINE
            da6p2   ONLINE
            da7p2   ONLINE
            da72p2  ONLINE
        spares
          da87p2



It seems that FreeNAS does ZFS little differently and they create a separate pool for every RAIDZ2 target with dedicated spares. Interesting …

UPDATE 1 – BSD Now 305

The FreeBSD Enterprise 1 PB Storage article was featured in the BSD Now 305 – Changing Face of Unix episode.

Thanks for mentioning!

UPDATE 2 – Real Life Pictures in Data Center

Some of you asked for a real life pictures of this monster. Below you will find several pics taken at the data center.

Front case with cabling.

tyan-real-01.jpg

Alternate front view.

tyan-real-09.jpg

Back of the case with cabling.

tyan-real-02.jpg

Top view with disks.

tyan-real-03

Alternate top view.

tyan-real-07.jpg

Disks slots zoom.

tyan-real-08.jpg

SSD and HDD disks.

tyan-real-06.jpg

EOF
Advertisements

Silent Fanless FreeBSD Server – Redundant Backup

I brought up this topic in the past. It was in the form of more theoretical Silent Fanless FreeBSD Desktop/Server post and more hands-on Silent Fanless FreeBSD Server – DIY Backup article.

One of the comments after the latter was that I compared non-redundant backup solution (single disk) to redundant backup in the cloud. Today – as this is my main backup system – I would like to show you redundant backup solution with two disks in ZFS mirror along with real power usage measurements. This time I got ASRock J3355B-ITX motherboard with only 10W TDP which includes 2-core Celeron J3355 2.0-2.5 GHz CPU and small shiny REALAN H80 Mini ITX case. It looks very nice and comes from AliExpress at very low $33 price for new unit along with free shipping.

Build

Here is how the REALAN H80 case looks like.

realan-H80-render

The ASRock J3355B-ITX motherboard.

asrock-J3355B-ITX.jpg

Same as with the earlier build the internal Seagate BarraCuda 5TB 2.5 SATA drives costs about $200. The same Seagate Backup Plus 5TB 2.5 disk in external case with USB 3.0 port costs nearly half of that price – only $120 – at least in the Europe/Poland location. I took the decision to buy external ones and rip off their cases. That saved me about $160.

Here is the simple performance benchmark of these 2.5 disks.

% which pv
pv: aliased to pv -t -r -a -b -W -B 1048576

% pv  /dev/null
1.35GiB 0:00:10 [ 137MiB/s] [ 137MiB/s]
^C

% dd  /dev/null bs=8M
127+0 records in
127+0 records out
1065353216 bytes transferred in 7.494081 secs (142159287 bytes/sec)
^C

About 135MB/s per disk.

The ripped of parts of Seagate Backup Plus USB cases.

external-case-parts.jpg

What made me laugh was that as I got different cases colors (silver and gray) the disks inside also had different colors (green and blue) :>

disks-bottom

… but their part number is the same, here they are mounted on a REALAN H80 disks holder.

disks-mounted

For the record – several REALAN H80 case real shots (not renders). First its front.

realan-H80-front

Back.

realan-H80-back.jpg

Side with USB port.

realan-H80-side-usb

Bottom.

realan-H80-bottom.jpg

Top.

realan-H80-top

Case parts.

realan-H80-parts.jpg

Generally the REALAN H80 looks really nice. Little lower REALAN H60 (without COM slots/holes in the back) looks even better but I wanted to make sure that I will have room and space for hot air in that case – as space was not a problem for me.

Cost

The complete price tops at $220 total. Here are the parts used.

PRICE  COMPONENT
  $49  CPU/Motherboard ASRock J3355B-ITX Mini-ITX
  $10  RAM 4GB DDR3
  $13  PSU 12V 7.5A 90W Pico (internal)
   $2  PSU 12V 2.5A 30W Leader Electronics (external)
  $33  Supermicro SC101i
   $3  SanDisk Fit 16GB USB 2.0 Drive (system)
 $120  Seagate 5TB 2.5 drive (ONE)
 $120  Seagate 5TB 2.5 drive (TWO)
 $350  TOTAL

That is $110 for the ‘system’ and additional $240 for ‘data’ drives.

Today I would probably get the ASRock N3150DC-ITX or Gigabyte GA-N3160TN motherboard instead because of builtin DC jack slot (compatible with 19V power adapter) on its back. This will eliminate the need for additional internal Pico PSU power supply …

The ASRock N3150DC-ITX with builtin DC jack.

asrock-N3150DC-ITX.jpg

The Gigabyte GA-N3160TN with builtin DC jack.

gigabyte-GA-N3160TN.jpg

The Gigabyte GA-N3160TN is also very low profile motherboard as you can see from the back.

gigabyte-GA-N3160TN-back-other.jpg

It may be good idea to use this one instead ASRock N3150DC-ITX to get more space above the motherboard.

Β 

PSU

As in the earlier Silent Fanless FreeBSD Server – DIY Backup article I used small 12V 2.5A 30W compact and cheap external PSU instead of the large 90W PSU from FSP Group. As these low power motherboard does not need a lot of power.

New Leader Electronics PSU label.

silent-backup-psu-ext-label.jpg

The internal power supply is Pico PSU which now tops as 12V 7.5A 90W power.

silent-backup-psu-pico-12V-90W.jpg

Power Consumption

I also measured the power consumption with power meter.

silent-backup-power-meter.jpg

The whole box with two Seagate BarraCuda 5TB 2.5 drives for data on ZFS mirror and SanDisk 16GB USB 2.0 system drive used about 10.4W in idle state.

I used all needed settings from my earlier The Power to Serve – FreeBSD Power Management article with CPU speed limited between 0.4GHz and 1.2GHz.

The powerd(8) settings in the /etc/rc.conf file are below.

powerd_flags="-n hiadaptive -a hiadaptive -b hiadaptive -m 400 -M 1200"

I used python(1) [1] to load the CPU and dd(8) to load the drives. I used dd(8) on the ZFS pool so 1 disk thread will read [2] and write [3] from/to both 2.5 disks. I temporary disabled LZ4 compression for the write tests.

[1] # echo '999999999999999999 ** 999999999999999999' | python
[2] # dd  /dev/null bs=1M
[3] # dd > /data/FILE < /dev/zero bs=1M
POWER   CPU LOAD         I/O LOAD
10.4 W  IDLE             IDLE
12.9 W  IDLE             1 DISK READ Thread(s)
14.3 W  IDLE             1 DISK READ Thread(s) + 1 DISK WRITE Thread(s)
17.2 W  IDLE             3 DISK READ Thread(s) + 3 DISK WRITE Thread(s)
11.0 W  8 CPU Thread(s)  IDLE
13.4 W  8 CPU Thread(s)  1 DISK READ Thread(s)
15.0 W  8 CPU Thread(s)  1 DISK READ Thread(s) + 1 DISK WRITE Thread(s)
17.8 W  8 CPU Thread(s)  3 DISK READ Thread(s) + 3 DISK WRITE Thread(s)

That’s not much remembering that 6W TDP power motherboard ASRock N3150B-ITX with just single Maxtor M3 4TB 2.5 USB 3.0 drive used 16.0W with CPU and I/O loaded. Only 1.8W more (on loaded system) with redundancy on two 2.5 disks.

Commands

The crypto FreeBSD kernel module was able to squeeze about 68MB/s of random data from /dev/random as this CPU has built in hardware AES-NI acceleration. Note to Linux users – the /dev/random and /dev/urandom are the same thing on FreeBSD. I used both dd(8) and pv(1) commands for this simple test. I made two tests with powerd(8) enabled and disabled to check the difference between CPU speed at 1.2GHz and at 2.5GHz with Turbo mode.

Full speed with Turbo enabled (note 2001 instead of 2000 for CPU frequency)..

# /etc/rc.d/powerd stop
Stopping powerd.
Waiting for PIDS: 1486.

% sysctl dev.cpu.0.freq
dev.cpu.0.freq: 2001

% which pv
pv: aliased to pv -t -r -a -b -W -B 1048576

% dd  /dev/null
1.91GiB 0:00:31 [68.7MiB/s] [68.1MiB/s]
265+0 records in
265+0 records out
2222981120 bytes transferred in 33.566154 secs (70226864 bytes/sec)
^C

CPU limited to 1.2GHz with powerd(8) daemon was able to squeeze about 24MB/s.

# service powerd start
Starting powerd.

% which pv
pv: aliased to pv -t -r -a -b -W -B 1048576

% dd  /dev/null
568MiB 0:00:23 [25.3MiB/s] [24.7MiB/s]
71+0 records in
71+0 records out
595591168 bytes transferred in 23.375588 secs (25479195 bytes/sec
^C

Below I will show you the data from dmesg(8) about the used USB and 2.5 drives.

The dmesg(8) information for the SanDisk Fit USB 2.0 16GB drive.

# grep da0 /var/run/dmesg.boot
da0 at umass-sim1 bus 1 scbus3 target 0 lun 0
da0:  Removable Direct Access SPC-4 SCSI device
da0: Serial Number 4C530002030502100093
da0: 400.000MB/s transfers
da0: 14663MB (30031250 512 byte sectors)
da0: quirks=0x2

… and two Seagate BarraCuda 5TB 2.5 drives.

# grep ada /var/run/dmesg.boot
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0:  ACS-3 ATA SATA 3.x device
ada0: Serial Number WCJ0DRJE
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 4769307MB (9767541168 512 byte sectors)
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1:  ACS-3 ATA SATA 3.x device
ada1: Serial Number WCJ0213S
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 4769307MB (9767541168 512 byte sectors)

The whole /var/run/dmesg.boot content (without disks) is shown below.

# cat /var/run/dmesg.boot
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.2-RELEASE-p7 #0: Tue Dec 18 08:29:33 UTC 2018
    root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
VT(vga): resolution 640x480
CPU: Intel(R) Celeron(R) CPU J3355 @ 2.00GHz (1996.88-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x506c9  Family=0x6  Model=0x5c  Stepping=9
  Features=0xbfebfbff
  Features2=0x4ff8ebbf
  AMD Features=0x2c100800
  AMD Features2=0x101
  Structured Extended Features=0x2294e283
  XSAVE Features=0xf
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics
real memory  = 4294967296 (4096 MB)
avail memory = 3700518912 (3529 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: 
WARNING: L1 data cache covers less APIC IDs than a core
0 < 1
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
ioapic0  irqs 0-119 on motherboard
SMP: AP CPU #1 Launched!
Timecounter "TSC" frequency 1996877678 Hz quality 1000
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff80ff4580, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
nexus0
vtvga0:  on motherboard
cryptosoft0:  on motherboard
acpi0:  on motherboard
unknown: I/O range not supported
cpu0:  on acpi0
cpu1:  on acpi0
attimer0:  port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0:  port 0x70-0x77 on acpi0
atrtc0: Warning: Couldn't map I/O.
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
hpet0:  iomem 0xfed00000-0xfed003ff irq 8 on acpi0
Timecounter "HPET" frequency 19200000 Hz quality 950
Event timer "HPET" frequency 19200000 Hz quality 550
Event timer "HPET1" frequency 19200000 Hz quality 440
Event timer "HPET2" frequency 19200000 Hz quality 440
Event timer "HPET3" frequency 19200000 Hz quality 440
Event timer "HPET4" frequency 19200000 Hz quality 440
Event timer "HPET5" frequency 19200000 Hz quality 440
Event timer "HPET6" frequency 19200000 Hz quality 440
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0:  port 0x408-0x40b on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
vgapci0:  port 0xf000-0xf03f mem 0x90000000-0x90ffffff,0x80000000-0x8fffffff irq 19 at device 2.0 on pci0
vgapci0: Boot video device
hdac0:  mem 0x91210000-0x91213fff,0x91000000-0x910fffff irq 25 at device 14.0 on pci0
pci0:  at device 15.0 (no driver attached)
ahci0:  port 0xf090-0xf097,0xf080-0xf083,0xf060-0xf07f mem 0x91214000-0x91215fff,0x91218000-0x912180ff,0x91217000-0x912177ff irq 19 at device 18.0 on pci0
ahci0: AHCI v1.31 with 2 6Gbps ports, Port Multiplier supported
ahcich0:  at channel 0 on ahci0
ahcich1:  at channel 1 on ahci0
pcib1:  irq 22 at device 19.0 on pci0
pci1:  on pcib1
pcib2:  irq 20 at device 19.2 on pci0
pci2:  on pcib2
re0:  port 0xe000-0xe0ff mem 0x91104000-0x91104fff,0x91100000-0x91103fff irq 20 at device 0.0 on pci2
re0: Using 1 MSI-X message
re0: Chip rev. 0x4c000000
re0: MAC rev. 0x00000000
miibus0:  on re0
rgephy0:  PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re0: Using defaults for TSO: 65518/35/2048
re0: Ethernet address: 70:85:c2:3f:53:41
re0: netmap queues/slots: TX 1/256, RX 1/256
xhci0:  mem 0x91200000-0x9120ffff irq 17 at device 21.0 on pci0
xhci0: 32 bytes context size, 64-bit DMA
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
isab0:  at device 31.0 on pci0
isa0:  on isab0
acpi_button0:  on acpi0
acpi_tz0:  on acpi0
atkbdc0:  at port 0x60,0x64 on isa0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
ppc0: cannot reserve I/O port range
est0:  on cpu0
est1:  on cpu1
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 1.000 msec
hdacc0:  at cad 0 on hdac0
hdaa0:  at nid 1 on hdacc0
ugen0.1:  at usbus0
uhub0:  on usbus0
pcm0:  at nid 21 and 24,26 on hdaa0
pcm1:  at nid 20 and 25 on hdaa0
pcm2:  at nid 27 on hdaa0
hdacc1:  at cad 2 on hdac0
hdaa1:  at nid 1 on hdacc1
pcm3:  at nid 3 on hdaa1
uhub0: 15 ports with 15 removable, self powered
ugen0.2:  at usbus0
uhub1 on uhub0
uhub1:  on usbus0
uhub1: 4 ports with 4 removable, self powered
Trying to mount root from zfs:zroot/ROOT/default []...
random: unblocking device.
re0: link state changed to DOWN

ZFS Pool Configuration

To get higher LZ4 compression ratio I use larger blocksize (1MB) on this ZFS mirror pool. Here is the ZFS pool status.

% zpool status data
  pool: data
 state: ONLINE
  scan: scrub repaired 0 in 44h14m with 0 errors on Mon Feb 11 07:13:42 2019
config:

        NAME                STATE     READ WRITE CKSUM
        data                ONLINE       0     0     0
          mirror-0          ONLINE       0     0     0
            label/WCJ0213S  ONLINE       0     0     0
            label/WCJ0DRJE  ONLINE       0     0     0

errors: No known data errors

I get 4% compression (1.04x) on that ZFS pool. Its about 80% filled with lots of movies and photos so while such compression ratio may not be great it gives a lot of space. For example 4% of 4TB of data is about 160GB of ‘free’ space.

% zfs get compressratio data
NAME                                    PROPERTY       VALUE  SOURCE
data                                    compressratio  1.04x  -

Here is the ZFS pool configuration.

# zpool history
History for 'data':
2018-11-12.01:18:33 zpool create data mirror /dev/label/WCJ0229Z /dev/label/WCJ0DPHF
2018-11-12.01:19:11 zfs set mountpoint=none data
2018-11-12.01:19:16 zfs set compression=lz4 data
2018-11-12.01:19:21 zfs set atime=off data
2018-11-12.01:19:34 zfs set primarycache=metadata data
2018-11-12.01:19:40 zfs set secondarycache=metadata data
2018-11-12.01:19:45 zfs set redundant_metadata=most data
2018-11-12.01:19:51 zfs set recordsize=1m data
(...)

We do not need redundant_metadata as we already have two disks, its useful only on single disks configurations.

Self Solution Cost

As in the earlier post I will again calculate how much energy this server would consume. Currently 1kWh of power costs about $0.20 in Europe/Poland (rounded up). This means that running computer with 1000W power usage for 1 hour would cost you $0.20 on electricity bill. This system uses 10.4W idle and 12.9W when single disk read occurs. For most of the time server will be idle so I assume 11.0W average for the pricing purposes.

That would cost us $0.0022 for 11.0W device running for 1 hour.

Below you will also find calculations for 1 day (24x multiplier), 1 year (another 365.25x multiplier) and 3 years (another 3x multiplier).

   COST  TIME
$0.0022  1 HOUR(S)
$0.0528  1 DAY(S)
$19.285  1 YEAR(S)
$57.856  3 YEAR(S)
$96.426  5 YEAR(S)

Combining that with server cost ($350) we get TCO for our self hosted 5TB storage service.

   COST  TIME
$369.29  1 YEAR(S)
$407.86  3 YEAR(S)
$446.43  5 YEAR(S)

Our total 3 years TCO is $407.86 and 5 years is $446.43. Its for running system non-stop. We can also implement features like Wake On LAN to limit that power usage even more.

Cloud Storage Prices

This time after searching for cheapest cloud based storage I found these services.

  • Amazon Drive
  • Amazon S3 Glacier Storage
  • Backblaze B2 Cloud Storage
  • Google One

Here is its cost summarized for 1 year period for 5TB of data.

PRICE  TIME       SERVICE
 $300  1 YEAR(S)  Amazon Drive
 $310  1 YEAR(S)  Google One
 $240  1 YEAR(S)  Amazon S3 Glacier Storage
 $450  1 YEAR(S)  Backblaze B2 Cloud Storage

For the Backblaze B2 Cloud Storage I assumed average between upload/download price because upload is two times cheaper then download.

Here is its cost summarized for 3 year period for 5TB of data.

PRICE  TIME       SERVICE
 $900  3 YEAR(S)  Amazon Drive
 $930  3 YEAR(S)  Google One
 $720  3 YEAR(S)  Amazon S3 Glacier Storage
$1350  3 YEAR(S)  Backblaze B2 Cloud Storage

Here is its cost summarized for 5 year period for 5TB of data.

PRICE  TIME       SERVICE
$1500  5 YEAR(S)  Amazon Drive
$1550  5 YEAR(S)  Google One
$1200  5 YEAR(S)  Amazon S3 Glacier Storage
$2250  5 YEAR(S)  Backblaze B2 Cloud Storage

Now lets compare costs of our own server to various cloud services.

If we would run our server for just 1 year the price will be similar.

PRICE  TIME       SERVICE
 $369  1 YEAR(S)  Self Build NAS
 $300  1 YEAR(S)  Amazon Drive
 $310  1 YEAR(S)  Google One
 $240  1 YEAR(S)  Amazon S3 Glacier Storage
 $450  1 YEAR(S)  Backblaze B2 Cloud Storage

It gets interesting when we compare 3 years costs. Its two times cheaper to self host our own server then use cloud services. One may argue that clouds are located in many places but even if we would buy two such boxes and put one – for example in our friends place at Jamaica – or other parts of the world.

PRICE  TIME       SERVICE
 $408  3 YEAR(S)  Self Build NAS
 $528  3 YEAR(S)  Self Build NAS (assuming one of the drives failed)
 $900  3 YEAR(S)  Amazon Drive
 $930  3 YEAR(S)  Google One
 $720  3 YEAR(S)  Amazon S3 Glacier Storage
$1350  3 YEAR(S)  Backblaze B2 Cloud Storage

… but with 5 years using cloud service instead of self hosted NAS solution is 3-5 times more expensive … and these were the cheapest cloud services I was able to find. I do not even want to know how much would it cos on Dropbox for example πŸ™‚

PRICE  TIME       SERVICE
 $447  5 YEAR(S)  Self Build NAS
 $567  5 YEAR(S)  Self Build NAS (assuming one of the drives failed)
$1500  5 YEAR(S)  Amazon Drive
$1550  5 YEAR(S)  Google One
$1200  5 YEAR(S)  Amazon S3 Glacier Storage
$2250  5 YEAR(S)  Backblaze B2 Cloud Storage

… and ‘anywhere’ access is not an argument for cloud services because you can get external IP address for you NAS or use Dynamic DNS – for free. You may also wonder why I compare such ‘full featured NAS’ with S3 storage … well with rclone (rsync for cloud storage) you are able to synchronize your files with almost anything πŸ™‚

Not to mention how much more privacy you have with keeping all your data to yourself … but that is priceless.

You can also setup a lot more services on such hardware – like FreeNAS with Bhyve/Jails virtualization … or Nextcloud instance … or Syncthing … while cloud storage is only that – a storage in the cloud.

Summary

Not sure what else could I include in this article. If you have an idea what else could I cover then let me know.

EOF

Β 

Valuable News – 2018/08/25

UNIX

OpenBSD adds kcov(4) kernel code coverage tracing driver.
So far 8 distinct panics have been found and fixed.
https://marc.info/?l=openbsd-cvs&m=153467896308034&w=2

GCC 8.2 now packaged and available in Illumos/OpenIndiana.
https://bsd.network/@sehnsucht/100581557620270760
https://pkg.openindiana.org/hipster/info/0/developer%2Fgcc-8%408.2.0%2C5.11-2018.0.0.0%3A20180815T204704Z

FreeBSD arc4random is now based on ChaCha20 implementation from OpenBSD.
https://twitter.com/lattera/status/1031280553301925888
https://svnweb.freebsd.org/base?view=revision&revision=338059

Valve forked WINE into Proton as compatibility tool for Steam Play.
https://github.com/ValveSoftware/Proton/
https://steamcommunity.com/games/221410/announcements/detail/1696055855739350561

AMD Threadripper 2990WX 32-core/64-thread on DragonFly BSD.
http://apollo.backplane.com/DFlyMisc/threadripper.txt
http://lists.dragonflybsd.org/pipermail/users/2018-August/357858.html

Using 10GE Adapters with PowerVM SEA – Virtual Ethernet Considerations.
http://ibmsystemsmag.com/aix/administrator/virtualization/using-10gbit-ethernet-adapters/

Native ZFS Encryption on FreeBSD CFT on the road to 12.0-RELEASE.
https://lists.freebsd.org/pipermail/freebsd-current/2018-August/070832.html

Backup FreeNAS and TrueNAS to Backblaze B2 Cloud.
https://www.backblaze.com/blog/how-to-setup-freenas-cloud-storage/

Colin Percival heroic (I am not joking here) fight for removing unneeded sleeps during boot on FreeBSD.
https://twitter.com/cperciva/status/1031928231635677184
https://reviews.freebsd.org/D16723

Writing SYSTEMD service files.
https://twitter.com/mulander/status/1031908074733428736
https://obsd.pl/mfm/iptables/

Illumos/Tribblix packages of openjdk9 and openjdk10 available.
https://twitter.com/ptribble/status/1031650238266789893
https://twitter.com/ptribble/status/1031900360271491074
http://pkgs.tribblix.org/openjdk/

Difference between OpenBSD xenodm and regular xdm.
https://undeadly.org/cgi?action=article&sid=20160911231712

X.Org Security Advisory – 2018/08/21.
http://seclists.org/oss-sec/2018/q3/146

FreeBSD removes legacy DRM and DRM2 from its tree.
https://twitter.com/f0andrey/status/1032234624544583680
https://svnweb.freebsd.org/base?view=revision&revision=338172

OmniOS CE (Community Edition) r151026p/r151024ap/r151022bn with CVE-2018-15473 addressed.
https://omniosce.org/article/releases-026p-024ap-022bn.html

Running Mastodon on FreeBSD.
https://ftfl.ca/blog/2017-05-23-mastodon-freebsd.html

Upgrading Mastodon on FreeBSD.
https://ftfl.ca/blog/2017-05-27-mastodon-freebsd-upgrade.html

KDE Plasma 5.x on Pinebook Laptop.
https://twitter.com/SoftpediaLinux/status/1032262240437723137

FreeBSD – Raspberry Pi 3B+ – UART.
https://blackdot.be/2018/08/freebsd-uart-and-raspberry-pi-3-b/

FreeBSD – Raspberry Pi 3B+ – Remote Access Console.
https://blackdot.be/2018/08/remote-access-console-using-raspberry-pi-3b-and-freebsd/

FreeBSD 12.x has LUA loader enabled by default.
https://twitter.com/bsdimp/status/1031638933690441728

In Other BSDs for 2018/08/18.
https://www.dragonflydigest.com/2018/08/18/21609.html

Shared library load order randomization in HardenedBSD for use with Firefox/Chromium/Iridium.
https://twitter.com/lattera/status/1030823681843507202

Researchers Blame ‘Monolithic’ Linux Code Base for Critical Vulnerabilities.
https://threatpost.com/researchers-blame-monolithic-linux-code-base-for-critical-vulnerabilities/136785/

2018/08/23 is the End of Life for NetBSD 6.x tree.
https://www.netbsd.org/changes/#netbsd6eol

Carlos Neira ZCAGE is now able to create BHYVE Branded Zones on Illumos.
https://bsd.network/@sehnsucht/100599247272911030
https://www.npmjs.com/package/zcage
https://asciinema.org/a/QLnjO8J2NVVPQrs3jh0EKEGta

FreeNAS 11.1-U6 Available.
https://twitter.com/FreeBSD_News/status/1032666675194167297
https://www.ixsystems.com/blog/library/freenas-11-1-u6/

FreeBSD vs. DragonFly BSD vs. Linux on AMD Threadripper 2990WX.
https://www.phoronix.com/scan.php?page=article&item=bsd-threadripper-2990wx

Disable SMT/Hyperthreading in all Intel BIOSes – Theo de Raadt.
https://marc.info/?l=openbsd-tech&m=153504937925732&w=2

OpenSSH 7.8 Released.
https://www.openssh.com/releasenotes.html#7.8

TRIM Consolidation on UFS/FFS Filesystems on FreeBSD.
https://lists.freebsd.org/pipermail/freebsd-current/2018-August/070797.html

FreeBSD vt(4) will now cache most recently drawn text to not redraw it.
https://reviews.freebsd.org/D16723

What is New in Solaris 11.4?
https://www.oracle.com/a/ocom/docs/dc/sev100738019-ww-us-on-ce1-ie1a-ev.html

OpenBSD Foundation gets first 2018 Iridium ($100K+) donation.
https://undeadly.org/cgi?action=article;sid=20180824145543

How to Run a More Secure Browser.
https://www.dragonflybsd.org/docs/docs/handbook/RunSecureBrowser/

Hardware

IBM POWER9 E950 and E980 Servers Launched.
https://www.servethehome.com/ibm-power9-e950-and-e980-servers-launched/

Intel Microcode EULA Prohibits Benchmarking!
https://twitter.com/RaptorEng/status/1031919319909892096
https://pastebin.com/raw/J8MXpPdh

GIGABYTE Cavium ThunderX2 1U and 2U Systems.
https://www.anandtech.com/show/13234/gigabyte-starts-sales-of-cavium-thunderx2-to-general-customers

Fujitsu Presents Post-K arm64 A64FXβ„’ CPU CPU Specifications with 48 Computing Cores and 4 Assistant Cores.
http://www.fujitsu.com/global/about/resources/news/press-releases/2018/0822-02.html

A4000TX ATX Motherboard.
http://www.amibay.com/showthread.php?101477-A4000TX-ATX-Amiga-motherboard

IBM POWER9 Scale Up CPUs with Huge IO and Effective 32 Channel DDR4.
https://www.servethehome.com/ibm-power9-hc30/

Life

Why We Sleep by Matthew Walker review – how more sleep can save your life.
https://www.theguardian.com/books/2017/sep/21/why-we-sleep-by-matthew-walker-review
https://youtube.be/pwaWilO_Pig

Bullshit jobs and the yoke of managerial feudalism.
https://www.economist.com/open-future/2018/06/29/bullshit-jobs-and-the-yoke-of-managerial-feudalism

Why Garbagemen Should Earn More Than Bankers.
https://evonomics.com/why-garbage-men-should-earn-more-than-bankers/

Solitude.
https://www.pa-mar.net/Lifestyle/Solitude.html

Akrasia Effect – Why We Dont Follow Through on What We Set Out to Do and What to Do About It.
https://jamesclear.com/akrasia

Other

Move/migrate Oracle and MySQL databases to PostgreSQL.
http://www.ora2pg.com/start.html
https://github.com/darold/ora2pg/releases

LIDL Killed SAP Migration After Spending 500 Million Dollars.
https://it.toolbox.com/blogs/clintonjones/lidl-cans-sap-project-after-spending-half-a-billion-073118

All BlackHat 2018 Attendee Registration Data Hacked and Available via Unauthenticated API.
https://ninja.style/post/bcard/
https://twitter.com/binitamshah/status/1032084847345459204

GOG Launches FCKDRM to Promote DRM-Free Art and Media.
https://torrentfreak.com/gog-launches-fckdrm-to-promote-drm-free-art-and-media-180822/

EOF