Tag Archives: freebsd

Quare FreeBSD?

I really wanted to make this article short … but I failed miserably. At least I tried to organize it well so one may get back to it after ‘some’ reading because its not a short lecture. I wanted to title it Why FreeBSD? but when you type that into your favorite duck.com search engine there are so many similar articles. I wanted it to have distinguished and unique name so I used Latin word for ‘why‘ which is ‘quare‘.

logo-freebsd

What FreeBSD can offer you that other operating systems does not? From all of the operating systems I used I find FreeBSD to suck the least. This post is not here to convince you to use or try FreeBSD – this you will have to do by yourself. This article will show you why FreeBSD is valuable or better alternative to other operating systems and is definitely not dying.

This is the Table of Contents for this article.

  • Base System
  • ZFS Boot Environments
  • Rescue
  • Audio
  • Jails
  • FreeBSD Ports Infrastructure
  • Updating/Building from Source
  • Storage
  • Init System
  • Linux Binary Compatibility
  • Simplicity
  • Evolution Instead Rewriting
  • Documentation
  • Community
  • Closing Thoughts

Base System

When you install a Linux system its just a bunch of RPM or DEB packages. For example of you install CentOS 7.8 Minimal variant you end up with several hundred RPM packages installed. After a week or month many of these packages will get updates sometimes making this CentOS system unusable or even unbootable (recent GRUB Boothole problem for example). On the contrary FreeBSD comes with a Base System concept. This means that when you install FreeBSD you install a minimal system as a whole. No packages or subsystems to be separately updated. Just whole Base System. That means that /boot /bin /sbin /usr /etc /lib /libexec /rescue directories are untouchable by any packages. When you decide to install packages (or build them using FreeBSD Ports) they will all fall into the /usr/local prefix. That means /usr/local/etc for configuration. The /usr/local/bin and /usr/local/sbin directories for binaries. The /usr/local/lib and /usr/local/libexec for libraries and so on. The FreeBSD Base System kernel modules are kept in the same dir along with the kernel in the /boot/kernel directory. To make things tidy all kernel modules that are provided by packages go into the /boot/modules dir. Everything has its place and its separated.

That is separation between Base System binaries (at /bin /sbin /usr/bin /usr/sbin dirs) and Third Party Packages maintained by pkg(8) and are located at /usr/local/bin and /usr/local/sbin dirs. We all know differences between bin (user) and sbin (root) binaries but in FreeBSD there is also another more UFS related separation. When there was only UFS filesystem in the FreeBSD world the /bin and /sbin binaries were available at boot after the root (/) filesystem was mounted and yet before /usrย  filesystem was mounted – this is historical (and still useful in the UFS setups) distinction dating to old UNIX days. In ZFS setups it does not matter as all files are on ZFS pool anyway.

The FreeBSD Base System separation also helps with another thing – if any package gets the ‘great’ idea to install new compiler named cc and override the default system compiler … or to add libraries/includes in such a way that makes it super hard to get back into a working system. If some random FreeBSD package would add libc.so to /usr/local/lib dir then you are covered and not prevented from running programs as usual because FreeBSD system binaries are linked to stuff in /usr/lib dir. This is why there is PATH variable on UNIX systems (and FreeBSD as well) to set which directories should be searched for binaries first. On FreeBSD by default its set search Base System binaries dirs first and then Third Party Packages later.

You can update (or not) the Base System separately from the installed packages with freebsd-update(8) command when using RELEASE or by recompiling with make buildworld and make installworld commands when using STABLE/CURRENT systems. When it comes to packages you can update them using the pkg(8) tool or portmaster when building from FreeBSD Ports tree under /usr/ports dir. That means that any packages updates will not touch your FreeBSD Base System at all. For example when you mess up (and I have done that in the beginning of my FreeBSD journey) the compiled ports and packages and you want to start over the only thing you have to do is remove /usr/local and /boot/modules and /var/db/pkg directories. That’s it. You are just reverted to your Base System and can start over. This is just not possible when using Linux system. Even with Gentoo that many concepts are based on FreeBSD ideas does not have Base System feature. This Base System also have additional feature. Because its separated from packages version no one stops you from running oldshool FreeBSD 9.0 from 2012 and install there latest Firefox 80 or LibreOffice 7.0. You can not install latest Firefox on Ubuntu from 2012 …

One may be ‘afraid’ that such Base System independent from installed packages would take more space but nothing far more from the truth. The fresh installed FreeBSD 12.1 system uses less then 1 GB of disk space and takes less then 75 MB of RAM with sshd(8) running. For the comparison fresh CentOS 7.8 install with ‘Minimal’ set chosen takes 1.1 GB of disk space and uses more then 100 MB RAM with sshd(8) running. Such CentOS system is really naked and really needs more packages to be usable while FreeBSD with its Base System is far more capable and powerful and comes along with builtin latest version of LLVM/CLANG compiler suite for example.

More on the Base System topic:

ZFS Boot Environments

I have talked about this many times and probably one time too less because Linux world still ignores this bless. Having ZFS Boot Environments its such a game changer that once you realize how powerful it is you will never want to use a system that does not support it. The idea is that you can snapshot a running system at any moment of time and then reboot into that moment (or snapshot) if something happened. Its perfect solution for upgrade or changes to the system. The FreeBSD systems are already well ‘protected’ from problems arising after updating the packages but ZFS Boot Environments takes this to a whole new level.

groundhog

Like in the movie Groundhog Day (1993) with ZFS Boot Environments you will have limitless chances to get your shit toghether. Even the Base System updates and changes are protected by it. You can even transport that Boot Environment by using zfs send and zfs recv commands to other system … or propagate it on many systems. You can create Jails containers from it … or install new version of FreeBSD in the new Boot Environment and reboot into it while still having your older ‘production’ system untouched.

More on the ZFS Boot Environments topic:

Rescue

When you really mess up to the point that even Base System concept or ZFS Boot Environments feature did not stopped you from killing your FreeBSD installation then there is one more level of rescue … the Rescue subsystem.

rescue

You have about 150 statically linked binaries available at your disposal for the rescue mission of that FreeBSD installation. You probably think now that if its so many binaries then it probably takes a lot of space … nothing far more from the truth. Its actually one static binary with hardlinks … and it takes whooping 11 MB of disk space.

# ls -lh /rescue | head -5
total 1118446
-r-xr-xr-x  146 root  wheel    11M 2020.02.19 21:10 [
-r-xr-xr-x  146 root  wheel    11M 2020.02.19 21:10 bectl
-r-xr-xr-x  146 root  wheel    11M 2020.02.19 21:10 bsdlabel
-r-xr-xr-x  146 root  wheel    11M 2020.02.19 21:10 bunzip2

They Rescue subsystem even contains such binaries as bectl(8) for ZFS Boot Environments management or zfs(8) and zpool(8) commands for the ZFS filesystem. Here is complete list of these binaries.

# ls /rescue
[           dd               fsck_ffs      init       mdmfs          ping      rtsol        unlink
bectl       devfs            fsck_msdosfs  ipf        mkdir          ping6     savecore     unlzma
bsdlabel    df               fsck_ufs      iscsictl   mknod          pkill     sed          unxz
bunzip2     dhclient         fsdb          iscsid     more           poweroff  setfacl      unzstd
bzcat       dhclient-script  fsirand       kenv       mount          ps        sh           vi
bzip2       disklabel        gbde          kill       mount_cd9660   pwd       shutdown     whoami
camcontrol  dmesg            geom          kldconfig  mount_msdosfs  rcorder   sleep        xz
cat         dump             getfacl       kldload    mount_nfs      rdump     spppcontrol  xzcat
ccdconfig   dumpfs           glabel        kldstat    mount_nullfs   realpath  stty         zcat
chflags     dumpon           gpart         kldunload  mount_udf      reboot    swapon       zdb
chgrp       echo             groups        ldconfig   mount_unionfs  red       sync         zfs
chio        ed               gunzip        less       mt             rescue    sysctl       zpool
chmod       ex               gzcat         link       mv             restore   tail         zstd
chown       expr             gzip          ln         nc             rm        tar          zstdcat
chroot      fastboot         halt          ls         newfs          rmdir     tcsh         zstdmt
clri        fasthalt         head          lzcat      newfs_msdos    route     tee          
cp          fdisk            hostname      lzma       nextboot       routed    test         
csh         fsck             id            md5        nos-tun        rrestore  tunefs       
date        fsck_4.2bsd      ifconfig      mdconfig   pgrep          rtquery   umount   

More on the Rescue topic:

Audio

Not many people expect from FreeBSD to shine in that department but it shines a lot here and not from yesterday but from decades. Remember when Linux got rid of the old OSS subsystem with one channel and came up with ‘great’ idea to write ALSA? I remember because I used Linux back then. Disaster is very polite word to describe Linux audio stack back then … and then PulseAudio came and whole Linux audio system got much worse. Back then because of that one OSS channel and many ALSA channels meant that ONLY ONE application with OSS backend could do the sound (for example WINE). But if another application would want to ‘make’ sound using OSS and you already have WINE started then it will be soundless because that one and only OSS channel was already taken. And remember that ALSA was so bad back then that KDE or GNOME made their own sound daemons mixing audio in userspace that were incompatible with each other. That means if you used KDE and GNOME apps back then you could have sound from GNOME apps but not from KDE apps or vice versa. One big fucking audio hell on Linux.

audio

Lets get back to FreeBSD audio then. What FreeBSD offered? A whooping 256 OSS channels mixed live in kernel for low latency. Everything audio related just worked out of the box – and still works today. You could have WINE or KDE/GNOME sound backends attached to their OSS channels and also ALSA apps getting their sound device without a problem. Even when you plugged a 5.1 surround system into FreeBSD it worked out of the box without any configuration and applications were able to use it immediately. That FreeBSD audio supremacy remains today as PulseAudio sound mixing in userspace while generally working incorporates large latency on Liunx compared to in kernel FreeBSD mixing with low latency.

Comrade meka suggested that FreeBSD is also the only OS which has virtual_oss that allows mixing/resampling/compressing in user space and allows one to have Bluetooth headphones and USB microphone represented as single sound card.

More on the Audio topic:

Jails

The FreeBSD Jails are one of the oldest OS Level Virtualization implementations dating back to 1999. Even the Solaris Zones/Containers came five years later in 2004.

containers

After Docker was introduced in Linux the term OS Level Virtualization became less used to the Containers term and now the FreeBSD Jails along with Solaris Zones/Containers are named 1st generation containers. But that naming nomenclature change does not make FreeBSD Jails less powerful. They are also really brain dead simple to use. You just need a directory – for example /jail/nextcloud – where you will extract the FreeBSD Base System for desired release version – for example base.txz from 12.1-RELEASE and create the Jail config in the /etc/jail.conf file as shown below.

# mkdir -p /jail/nextcloud
# fetch -o - http://ftp.freebsd.org/pub/FreeBSD/releases/amd64/12.1-RELEASE/base.txz | tar --unlink -xpJf - -C /jail/nextcloud
# cat /etc/jail.conf
nextcloud {
  host.hostname = nextcloud.local;
  ip4.addr = 10.0.0.100;
  path = /jail/nextcloud;
}

Now you can start you Jail right away.

# service jail onestart nextcloud
Starting jails: nextcloud.

Voila! Your FreeBSD Jail is already running.

# jls
   JID  IP Address      Hostname                      Path
     1  10.0.0.100      nextcloud.local               /jail/nextcloud

You can of course have a trimmed down version of FreeBSD Base System in the Jail if that is needed. The ZFS filesystem also helps here greatly because with zfs clone only your ‘base’ Jail will take space and only the changes you make to Jails created from it. Thanks to other FreeBSD subsystem – the Linux Binary Compatibility – you can also create a Linux Jail – for example running Devuan Jail.

The FreeBSD Jails are also very lightweight. You can boot and use about 1000 FreeBSD Jails on a single FreeBSD system with 4 GB RAM.

They are also very easy to debug and troubleshoot comparing even to plain Docker – not to even mention Kubernetes which requires whole team of highly skilled people to maintain.

The FreeBSD Jails may be configured/managed only by the Base System utilities such as jls(8)/jexec(8) but you can also select from many third party Jail management frameworks. From all available ones I would choose BastilleBSD because of their modern approach and many ready to use templates for all needed use cases.

More on the Jails topic:

FreeBSD Ports Infrastructure

This is one of another examples why FreeBSD rocks that much. When you install Ubuntu or CentOS in some version there is chance that you will end up with not latest versions of packages but with versions that were quite up-to-date when this distribution version was released. Its especially visible in the CentOS world (and its upstream enterprise source system from Red Hat) where packages are quite up-to-date when .0 (dot zero) release is published but are VERY outdated when .8 or .9 incarnation of that release is available. Not to even mention that Firefox for example is released every month …

packages

As I said before when describing the FreeBSD Base System the FreeBSD Ports (and packages built from it available through pkg(8)) are independent. That means that third party software from FreeBSD Ports is almost always up-to-date (or very close to it). You can even check it on the repology.org site for the details. Below you will find a ‘snapshot’ of the repology.org stats from time of writing this article. The ‘online’ table is very long so I copy/pasted just the systems relevant to the article.

repology

One of the other advantages of FreeBSD Ports is that it offers really MASSIVE amount of software counting 40354 ports when writing this article and still rising. Amount of ready to be installed packages are little smaller with more then 32000 available.

I once migrated for a while to OpenSolaris in 2009 on my Dell Latitude D630 laptop because I really liked all the Solaris features (including ZFS and ZFS Boot Environments that were not available on FreeBSD back then) and the OpenSolaris GNOME based desktop was pretty nice back then even with Time Slider feature for ZFS snapshots in the Nautilus file manager. I got working WiFi connection, sound was working, generally everything on my laptop was supported and working with OpenSolaris … but there was no software. Of course ‘large’ projects like GIMP or OpenOffice was available even in the default pkg(8) repository but not much else. There was less then 4000 packages back then on OpenSolaris while about 25000 packages on FreeBSD if I recall correctly.

You can also easily browse available FreeBSD Ports (and its options) on the web by using the https://freshports.org/ page.

ports

The count of FreeBSD Ports is one thing, the features is another. No matter which Linux distribution you are using you will find a software that was compiled and shipped without that needed flag that you desperately need. If you find such software on FreeBSD it ‘hurts’ only for a moment because you can VERY EASILY recompile that software with needed options and replace that ‘default’ package with yours. For example the FreeBSD project is afraid to provide packages of Lame because of existing MP3 patents, so multimedia/ffmpeg package is built without MP3 support (with --disable-libmp3lame flag). That is why I have my own audio/lame and multimedia/ffmpeg packages built with my configure options and that is very easy to achieve. You need to go to the /usr/ports/multimedia/ffmpeg dir type make config and select [x] LAME at the ncurses dialog. Your chosen options will be saved as plain /var/db/ports/multimedia_ffmpeg/options file. If you remove that file (or type make rmconfig) then these custom options will reset to defaults. Then you type make build deinstall install clean and your port with new options is ready and installed as package. Nothing more is needed. You can even lock that package from the pkg(8) upgrades with pkg lock -y ffmpeg command so it will not be modified later but its better to rebuild such packages everytime you do a pkg upgrade procedure because of libraries versions bump and changes. While its very easy and fast to create a script with these commands to make it more automated you can also use other parts of the FreeBSD Ports infrastructure – enter Poudriere (or Synth) – more on that in the next part.

You also do not have to configure each port that way (which could be PITA for large amount of ports) but you may specify your needed (OPTIONS_SET) or unwanted (OPTIONS_UNSET) parameters only once globally using the /etc/make.conf file. You can also specify which default versions of software you want to use, for example Apache 2.2 instead of 2.4 and PHP 7.0 instead of 7.2. You can find all default versions in the /usr/ports/Mk/bsd.default-versions.mk file. Once you setup these options you can build/rebuild or update your packages from FreeBSD Ports by portmaster(8) tool. Like on Gentoo Linux with USE flags. But this is the original. Gentoo took all/most of its ideas from FreeBSD system and its Ports infrastructure.

The Poudriere is a build framework that uses FreeBSD Ports and FreeBSD Jails to build requested packages in clean reproducible way. You can create whole new binary package repository for pkg(8) command to use with it. I mentioned Synth because while Poudriere is often used to produce whole package repository the Synth is usually used just to rebuild several packages that does not fit your needs.

There is one important things about FreeBSD Ports that is often misunderstood by newcomers. What is the difference between the Ports and packages that are fetched and installed by pkg(8) tool? Its quite simple. A package is just a build and installed port. Nothing more or less. When you use the binary packages using pkg(8) command you are using packages that someone (the FreeBSD project in that case) built for you from the FreeBSD Ports in some point in time. While FreeBSD strives to maintain as up-to-date built packages as possible its the nature of FreeBSD Ports that they are always more up-to-date then the built packages. That is why you may build and install a new version of needed packages by yourself using FreeBSD Ports. One may think of such usage when it comes to security holes. When some locally executed commands (like file(1) for example) has a security hole then its not critical for you to update it as fast as possible because that security hole can be harmless for you, but when new version of Firefox fixes very important security hole then its better to update from FreeBSD Ports version faster because waiting 2 days for the package to be built (along with other packages) can be too long.

More on the FreeBSD Ports topic:

Updating/Building from Source

While the FreeBSD Ports infrastructure is for third party software the FreeBSD Base System (or its parts) also can be easily and convenient build from source. The FreeBSD kernel config is also very small and simple. While Linux kernel config contains thousands of options – 4432 for example in the default CentOS 8.2 install the FreeBSD GENERIC config has about 20 times options less – only 260 options. But that does not saturate the topic. You can start with MINIMAL FreeBSD kernel config which has only 75 options specified.

Linux # grep -c '^CONFIG' /boot/config-$( uname -r )
4432

FreeBSD # grep -c -E '^(device|options)' /usr/src/sys/amd64/conf/GENERIC
260

FreeBSD # grep -c -E '^(device|options)' /usr/src/sys/amd64/conf/MINIMAL
75

… and its not only about smaller amount of options. Can you tell my how many steps (and which ones are required) to rebuild CentOS or Ubuntu for example without Bluetooth support?

code

On the contrary its very simple (and fast) on the FreeBSD side. While /etc/make.conf file is used to enable/disable Ports options the /etc/src.conf file is used to enable/disable FreeBSD Base System options while building it from source. To build FreeBSD without Bluetooth support just add WITHOUT_BLUETOOTH=yes to the /etc/src.conf file and type these to build it:

# beadm create safe
# cd /usr/src
# make buildworld kernel
# reboot
# cd /usr/src
# make installworld
# mergemaster -iU
# reboot

Voila! You now have FreeBSD without Bluetooth support … and if any of the steps failed or because of your lack of experience/expertise your FreeBSD system does not boot or is broken you can use tools from /rescue to try to fix it (or at least figure out what is broken) and when you do not want to cope with this jest select safe ZFS Boot Environment at the FreeBSD loader(8) to boot to the system before you started building modified version of FreeBSD. Yes, You are bulletproof here. While having 294 WITHOUT_X options and 125 WITH_X options you can really tune FreeBSD Base System to your needs.

# zgrep -c WITHOUT_ /usr/share/man/man5/src.conf.5.gz
294

# zgrep -c WITH_ /usr/share/man/man5/src.conf.5.gz
125

The big downside of updating FreeBSD by source is that you can not use the freebsd-update tools to do it … but nothing stops you from creating your own FreeBSD Update Server so you will be able to use freebsd-update by adding updates using a CURRENT or STABLE system instead of RELEASE. That process is described in the Build Your Own FreeBSD Update Server article of official FreeBSD documentation.

More on the FreeBSD Source Updates/Builds topic:

Storage

Storage is one of the parts where FreeBSD really shines. Lots of people adore FreeBSD for well integrated ZFS filesystem and its really true. ZFS in FreeBSD has always been first class citizen. Lately OpenZFS 2.0 has been also integrated from the upstream joint FreeBSD and Linux repository. More and more FreeBSD features and solutions are using ZFS features.

openzfs

Most of these people that like integrated ZFS in FreeBSD do not know about the FreeBSD GEOM modular disk transformation framework which provides various storage related features and utilities like software RAID0/RAID1/RAID10/RAID3/RAID5 configurations or transparent encryption of underlying devices with GELI/GDBE (like LUKS on Linux). It also allows transparent filesystem journaling for ANY filesystem with GJOURNAL (yes also for FAT32 or exFAT) or allows one to export block devices over network with GEOM GATE devices (like NFS for block devices).

storage

FreeBSD also has its own FUSE implementation which allows all these FUSE based filesystems to work natively on FreeBSD. While lots of Linux folks know DRBD probably very few of them knew that FreeBSD comes with its own DRBD like solution called HAST – which does exactly the same thing. While ZFS has a lot features and possibilities FreeBSD still maintains and develops fast and small memory footprint UFS filesystem which today is used either with Soft Updates (SU) or Journaled Soft Updates (SUJ) depending on the use case. For example 10 TB data on UFS filesystem with Journaled Soft Updates (SUJ) takes about 1 minute under fsck(8). These storage solutions are available from FreeBSD Base System alone. The FreeBSD Ports offers much more with distributed filesystems solutions such as CEPH, LeoFS, LizardFS or Minio for Amazon S3 compatible storage.

More on the Storage topic:

Init System

FreeBSD offers really simple yet very powerful init system. It has system wide config under /etc/rc.conf file when you can enable/disable needed services with service_enable=YES and service_enable=NO stanzas. You do not even need to launch vi(1) to add them – just type sysrc service_enable=YES and they are added to the /etc/rc.conf file. There are also default values and services that are enabled and you will find them – along with many comments – in the /etc/defaults/rc.conf file. Each FreeBSD service file has PROVIDE/REQUIRE stanzas which are then used to automatically order the services to start. Services that can be run in parallel are started in parallel to save time. For example its pointless to start sshd(8) daemon without network. To start or stop the serivice you need to type service sshd start or service sshd stop command. But when a service is not enabled in the /etc/rc.conf file then you need to used add onestart and onestop instead. The Base System separation remains here as FreeBSD Base System services are located at /etc/rc.d directory and third party applications from ports/packages are kept under /usr/local prefix which means /usr/local/etc/rc.d dir.

When using systemd(1) you never know how the services gonna start because it will be different each time. Zero determinism. On FreeBSD you know exactly which services will start when because they are always ordered in the same state according to the PROVIDE/REQUIRE stanzas. FreeBSD also offers tools that will tell you the exact order – rcorder(8) – which can be used for all services, Base System services or third party services separately. There is also service -r command that will show you what was the orfer at the boot time.

# rcorder /etc/rc.d/* | head
/etc/rc.d/growfs
/etc/rc.d/sysctl
/etc/rc.d/hostid
/etc/rc.d/zvol
/etc/rc.d/dumpon
/etc/rc.d/ddb
/etc/rc.d/geli
/etc/rc.d/gbde
/etc/rc.d/ccd
/etc/rc.d/swap

# rcorder /usr/local/etc/rc.d/* | tail
/usr/local/etc/rc.d/hald
/usr/local/etc/rc.d/git_daemon
/usr/local/etc/rc.d/fscd
/usr/local/etc/rc.d/cupsd
/usr/local/etc/rc.d/cups_browsed
/usr/local/etc/rc.d/clamav-clamd
/usr/local/etc/rc.d/clamav-milter
/usr/local/etc/rc.d/clamav-freshclam
/usr/local/etc/rc.d/avahi-dnsconfd
/usr/local/etc/rc.d/aria2

# rcorder /etc/rc.d/* /usr/local/etc/rc.d/* 2> | grep -C 3 sshd
/etc/rc.d/ubthidhci
/etc/rc.d/syscons
/etc/rc.d/swaplate
/etc/rc.d/sshd
/etc/rc.d/cron
/etc/rc.d/jail
/etc/rc.d/localpkg

Adding new service to FreeBSD is also very easy as template for new service is very small and simple.

#!/bin/sh

. /etc/rc.subr

name=dummy
rcvar=dummy_enable

start_cmd="${name}_start"
stop_cmd=":"

load_rc_config $name
: ${dummy_enable:=no}
: ${dummy_msg="Nothing started."}

dummy_start()
{
	echo "$dummy_msg"
}

run_rc_command "$1"

If its not simple enought for you there is dedicated FreeBSD article about writing them – Practical rc.d Scripting in BSD – available here.

More on the Init System topic:

Linux Binary Compatibility

While Linux can not be FreeBSD – the FreeBSD can be Linux – and its not some slow emulation – its implementation of Linux system calls. There was time when enterprises used to work with Linux only applications (not available on FreeBSD by then) using the Linux Binary Compatibility on FreeBSD because it was faster then running them natively on Linux – FreeBSD Used to Generate Spectacular Special Effects – an official FreeBSD Press Release about FreeBSD being used to generate spacial effects to the one of the best movies of all time – The Matrix (1999).

matrix

Today the LINUX_COMPAT is also natively fast and allows one to run Linux applications – even Linux games in X11 with hardware acceleration for graphics. Think of it as WINE but for Linux applications. It lives under /compat/linux directory. It even implements Linux /proc virtual filesystem which can be mounted at the /compat/linux/proc dir but its not mandatory. For any software that does not come with source code and works on Linux the Linux Binary Compatibility saves the day. For example the f.lux project. Before I got to know Redshift I used f.lux Linux binary using LINUX_COMPAT to suppress blue spectrum light from my FreeBSD screen. The Linux Binary Compatibility subsystem can also be used to run Linux bases FreeBSD Jails – with Devuan for example.

More on the Linux Binary Compatibility topic:

Simplicity

FreeBSD is simple but not coarse/ornery. For example as Linux the FreeBSD system also supports the /proc virtual filesystem but on FreeBSD its optional and not used by default while Linux could not live without it. But while Linux has mandatory /proc it also has another virtual filesystem residing under /sys … but why Linux people need two different virtual filesystems with similar purposes? Why they could not create everything under /proc as it already existed. That is big enigma for my sanity.

But /sys is not the end of that madness. Its just a beginning.

What about these?

  • securityfs
  • devpts
  • cgroup
  • pstore
  • bpf
  • configfs
  • selinuxfs
  • systemd-1
  • mqueue
  • debugfs
  • hugetlbfs

Take a look at the FreeBSD mount(8) output after the default install on ZFS.

FreeBSD # mount
zroot/ROOT/12.1 on / (zfs, local, noatime, nfsv4acls)
devfs on /dev (devfs, local, multilabel)
zroot/tmp on /tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot/var/mail on /var/mail (zfs, local, nfsv4acls)
zroot/usr/home on /usr/home (zfs, local, noatime, nfsv4acls)
zroot/var/crash on /var/crash (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/log on /var/log (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/audit on /var/audit (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/tmp on /var/tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot/usr/src on /usr/src (zfs, local, noatime, nfsv4acls)
zroot/usr/ports on /usr/ports (zfs, local, noatime, nosuid, nfsv4acls)

Several ZFS datasets and one virtual devfs filesystem for /dev directory. With install on UFS it would be similar with several UFS partitions mounted instead of ZFS datasets.

Take a look at the CentOS 8.2 installation with just one physical root (/) XFS filesystem.

[root@centos8 ~]# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=919388k,nr_inodes=229847,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime,seclabel)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpu,cpuacct)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,rdma)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/sda1 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17309)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel,pagesize=2M)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,seclabel,size=187088k,mode=700)

Fuck me. Its even really hard to just find any REAL filesystem there … fortunately we can ask for only XFS filesystems to display.

[root@centos8 ~]# mount -t xfs
/dev/sda1 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)

Lets get on the networking now. Lets assume that you want to make standard enterprise networking setup on a physical server with two interfaces aggregated together into highly available interface bond0 (lagg0 on FreeBSD) and then you want to put VLAN tag and IP address on that VLAN. The CentOS 7.x/8.x installer (Anaconda) will welcome you with this mess.

[root@centos7 ~]# ls -1 /etc/sysconfig/network-scripts/ifcfg-*
ifcfg-Bond_connection_1
ifcfg-eno49
ifcfg-eno49-1
ifcfg-eno50
ifcfg-eno50-1
ifcfg-VLAN_connection_1

[root@centos7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-Bond_connection_1
DEVICE=bond0
BONDING_OPTS="miimon=1 updelay=0 downdelay=0 mode=active-backup"
TYPE=Bond
BONDING_MASTER=yes
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_PRIVACY=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME="Bond connection 1"
UUID=ca85417f-8852-43bf-96ee-5bd3f0f83648
ONBOOT=yes

[root@centos7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eno49
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=eno49
UUID=2f60f50b-38ad-492a-b90a-ba736acf6792
DEVICE=eno49
ONBOOT=no

[root@centos7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eno49-1
HWADDR=xx:xx:xx:xx:xx:xx
TYPE=Ethernet
NAME=eno49
UUID=342b8494-126d-4f3a-b749-694c8c922aa1
DEVICE=eno49
ONBOOT=yes
MASTER=bond0
SLAVE=yes

[root@centos7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eno50
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=eno50
UUID=4fd36e24-1c6d-4a65-a316-7a14e9a92965
DEVICE=eno50
ONBOOT=no

[root@centos7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eno50-1
HWADDR=xx:xx:xx:xx:xx:xx
TYPE=Ethernet
NAME=eno50
UUID=a429b697-73c2-404d-9379-472cb3c35e06
DEVICE=eno50
ONBOOT=yes
MASTER=bond0
SLAVE=yes

[root@centos7 ~]# cat/etc/sysconfig/network-scripts/ifcfg-VLAN_connection_1
VLAN=yes
TYPE=Vlan
PHYSDEV=ca85417f-8852-43bf-96ee-5bd3f0f83648
VLAN_ID=601
REORDER_HDR=yes
GVRP=no
MVRP=no
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=10.20.30.40
PREFIX=24
GATEWAY=10.20.30.1
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_PRIVACY=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME="VLAN connection 1"
UUID=90f7a9bb-1443-4adf-a3eb-86a03b23ecfb
ONBOOT=yes

For the record – I have choosen ‘STATIC’ IPv4 address but installer made these interfaces to use DHCP and that STATIC address. That could be a bug but lets get to the point.

After manual fixing with vi(1) (and hour later) this is how it supposed to look.

[root@centos7 ~]# cat /etc/sysconfig/network
GATEWAY=10.20.30.1
NOZEROCONF=yes

[root@centos7 ~]# ls -1 /etc/sysconfig/network-scripts/ifcfg-*
ifcfg-bond0
ifcfg-bond0.601
ifcfg-eno49
ifcfg-eno50

[root@centos7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
BONDING_OPTS="miimon=1 updelay=0 downdelay=0 mode=active-backup"
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=none
IPV4_FAILURE_FATAL=no
IPV6INIT=no
ONBOOT=yes

[root@centos7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0.601
VLAN=yes
TYPE=Vlan
VLAN_ID=601
DEVICE=bond0.601
REORDER_HDR=yes
GVRP=no
MVRP=no
BOOTPROTO=none
IPADDR=10.20.30.40
PREFIX=24
IPV4_FAILURE_FATAL=no
IPV6INIT=no
ONBOOT=yes

[root@centos7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eno49
BOOTPROTO=none
IPV4_FAILURE_FATAL=no
IPV6INIT=no
TYPE=Ethernet
NAME=eno49
DEVICE=eno49
ONBOOT=yes
MASTER=bond0
SLAVE=yes

[root@centos7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eno50
BOOTPROTO=none
IPV4_FAILURE_FATAL=no
IPV6INIT=no
TYPE=Ethernet
NAME=eno50
DEVICE=eno50
ONBOOT=yes
MASTER=bond0
SLAVE=yes

Better … but still takes A LOT OF SPACE and several files to cover that quite simple setup. Not to mention its level of complication and making that very error prone way. The same configuration on FreeBSD would take just 7 lines within single /etc/rc.conf file as shown below.

ifconfig_fxp0="up"
ifconfig_fxp1="up"
cloned_interfaces="lagg0"
ifconfig_lagg0="laggproto failover laggport fxp0 laggport fxp1"
vlans_lagg0="601"
ifconfig_lagg0_601="inet 10.20.30.40/24"
defaultrouter="10.20.30.1"

What about the boot process? FreeBSD boots from root on ZFS partition with just small 512 KB not mountable partition. No separate /boot device is needed. On the other side Linux always needs that separate /boot partition filled with GRUB modules. No matter if its ZFS or LVM. That is why implementation of ZFS Boot Environments is quite complicated on Linux bacause even if you have root on ZFS on a Linux system there is still unprotected /boot filesystem that can not be snapshoted with ZFS and has to be protected in old classic way which kill the idea of ZFS Boot Environments or Linux.

FreeBSD is really simple and well thought operating system. But also a very underestimated one.

Evolution Instead Rewriting

How many Linux tools or subsystems are abandoned or superseeded by new ones? Why the ifconfig(8) command was not updated with new options and instead a new ip(8) command was introduced? Same with netstat(8) being replaced by ss(8). Same with arp(8)/iwconfig/route(8) and many more. What about whole init system? The Linux world has been taken over by systemd(1) whenever you like it or not. Even distributions that have grown their mature init systems like Ubuntu with its Upstart has moved to systemd(1) altogether. The distributions that do not use it are very few and considered a niche today.

evolution

In the FreeBSD land on the countary such things happen only if there is no other way to implement new things. Its the last thing wanted in the FreeBSD. FreeBSD evolves and is developed with stability and backward compatibility in mind. Userland tools are grown and updated with new options instead of rewriting them over and over again. Not to mention how many new bugs are introduced by changing one tool to another.

More on the Evolution Instead Rewriting topic:

Documentation

Having system that can do almost anything but not knowing how to do that makes that system pretty useless (or at least pretty PITA to use). FreeBSD offers second to none documentation that is actively maintained and updated. Along with its legendary FreeBSD Handbook and FreeBSD FAQ the FreeBSD project also offers official FreeBSD Articles about various FreeBSD topics. The Man Pages are also very detailed and contain many examples. There is also FreeBSD Wiki page for work in progress documentation and ideas related to FreeBSD development and if you have any problems or questions related to FreeBSD there are official FreeBSD Forums and oldschool Mailing Lists available.

documentation

These were only the official project knowledge sources but there are also lots of FreeBSD books. Here are the best and up-to-date ones.

  • Absolute FreeBSD – Complete Guide to FreeBSD – 3nd Edition (2019)
  • Beginning Modern Unix (2018)
  • Book of PF – 3rd Edition (2015)
  • Design and Implementation of FreeBSD 11 Operating System – 2nd Edition (2015)
  • FreeBSD Device Drivers (2012)
  • FreeBSD Mastery – ZFS (2015)
  • FreeBSD Mastery – Advanced ZFS (2016)
  • FreeBSD Mastery – Storage Essentials (2014)
  • FreeBSD Mastery – Specialty Filesystems (2015)
  • FreeBSD Mastery – Jails (2019)

There are also two magazines that are dedicated to BSD and FreeBSD systems. Both are free and cover lots of interesting topics regarding FreeBSD.

With all this knowledge and support its really hard not to achieve what you need/want with FreeBSD system.

Community

Last but not least and I would say its even more important then good documentation (which FreeBSD has awesome). People that use FreeBSD do that conciously and are often experienced not only in FreeBSD land but also in topics related to other UNIX systems. Often they took long road of first using the Linux systems before finally setting on the FreeBSD land or they still do Linux adminitration for a living while resting using far more reasonable and sensible FreeBSD solution. I always find FreeBSD Community helpful and friendly. Always willingly helpful – especially towards newcommers. Even when you try to ‘force’ FreeBSD people to ‘fight’ in unjust/doubtful discussion they will reply with dignity and technical arguments instead of yelling at you.

The FreeBSD project even made several articles and Handbook chapters especially for Linux newcommers (or sometimes called systemd(1) refugees).

Closing Thoughts

I tried really hard to not make it a Linux rant but some may feel it that way – if so please remember that this was not my intention. FreeBSD like Linux and like any other operating system has its ups and downs. Hope that I showed you most interesting FreeBSD parts. I may add new sections here without a warning in the future ๐Ÿ™‚

EOF

ย 

FreeBSD Cluster with Pacemaker and Corosync

I always missed ‘proper’ cluster software for FreeBSD systems. Recently I got to run several Pacemaker/Corosync based clusters on Linux systems. I thought how to make similar high availability solutions on FreeBSD and I was really shocked when I figured out that both Pacemaker and Corosync tools are available in the FreeBSD Ports and packages as net/pacemaker2 and net/corosync2 respectively.

In this article I will check how well Pacemaker and Corosync cluster works on FreeBSD.

pacemaker

There are many definitions of a cluster. One that I like the most is that a cluster is a system that is still redundant after losing one of its nodes (is still a cluster). This means that 3 nodes is a minimum for a cluster by that definition. The two node clusters are quite problematic because of their biggest exposure to the split brain problem. That is why often in the two node clusters additional devices or systems are added to make sure that this split brain does not happen. For example one can add third node without any resources or services just as a ‘witness’ role. Other way is to add a shared disk resource that will serve the same purpose and often its a raw volume with SCSI-3 Persistent Reservation mechanism used.

Lab Setup

As usual it will be entirely VirtualBox based and it will consist of 3 hosts. To not create 3 same FreeBSD installations I used 12.1-RELEASE virtual machine image available from the FreeBSD Project directly:

There are several formats available – qcow2/raw/vhd/vmdk – but as I will be using VirtualBox I used the VMDK one.

Here is the list of the machines for the GlusterFS cluster:

  • 10.0.10.111 node1
  • 10.0.10.112 node2
  • 10.0.10.113 node3

Each VirtualBox virtual machine for FreeBSD is the default one (as suggested in the VirtualBox wizard) with 512 MB RAM and NAT Network as shown on the image below.

machine

Here is the configuration of the NAT Network on VirtualBox.

nat-network-01

nat-network-02

Before we will try connect to our FreeBSD machines we need to make the minimal network configuration inside each VM. Each FreeBSD machine will have such minimal /etc/rc.conf file as shown example for node1 host.

root@node1:~ # cat /etc/rc.conf
hostname=node1
ifconfig_em0="inet 10.0.10.111/24 up"
defaultrouter=10.0.10.1
sshd_enable=YES

For the setup purposes we will need to allow root login on these FreeBSD machines with PermitRootLogin yes option in the /etc/ssh/sshd_config file. You will also need to restart the sshd(8) service after the changes.

root@node1:~ # grep PermitRootLogin /etc/ssh/sshd_config
PermitRootLogin yes

root@node1:~ # service sshd restart

By using NAT Network with Port Forwarding the FreeBSD machines will be accessible on the localhost ports. For example the node1 machine will be available on port 2211, the node2 machine will be available on port 2212 and so on. This is shown in the sockstat utility output below.

nat-network-03-sockstat

nat-network-04-ssh

To connect to such machine from the VirtualBox host system you will need this command:

vboxhost % ssh -l root localhost -p 2211

Packages

As we now have ssh(1) connectivity we need to add needed packages. To make our VMs resolve DNS queries we need to add one last thing. We will also switch to ‘quarterly’ branch of the pkg(8) packages.

root@node1:~ # echo 'nameserver 1.1.1.1' > /etc/resolv.conf
root@node1:~ # sed -i '' s/quarterly/latest/g /etc/pkg/FreeBSD.conf

Remember to repeat these two upper commands on node2 and node3 systems.

Now we will add Pacemaker and Corosync packages.

root@node1:~ # pkg install pacemaker2 corosync2 crmsh

root@node2:~ # pkg install pacemaker2 corosync2 crmsh

root@node3:~ # pkg install pacemaker2 corosync2 crmsh

These are messages both from pacemaker2 and corosync2 that we need to address.

Message from pacemaker2-2.0.4:

--
For correct operation, maximum socket buffer size must be tuned
by performing the following command as root :

# sysctl kern.ipc.maxsockbuf=18874368

To preserve this setting across reboots, append the following
to /etc/sysctl.conf :

kern.ipc.maxsockbuf=18874368

======================================================================

Message from corosync2-2.4.5_1:

--
For correct operation, maximum socket buffer size must be tuned
by performing the following command as root :

# sysctl kern.ipc.maxsockbuf=18874368

To preserve this setting across reboots, append the following
to /etc/sysctl.conf :

kern.ipc.maxsockbuf=18874368

We need to change the kern.ipc.maxsockbuf parameter. Lets do it then.

root@node1:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node1:~ # service sysctl restart

root@node2:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node2:~ # service sysctl restart

root@node3:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node3:~ # service sysctl restart

Lets check what binaries come with these packages.

root@node1:~ # pkg info -l pacemaker2 | grep bin
        /usr/local/sbin/attrd_updater
        /usr/local/sbin/cibadmin
        /usr/local/sbin/crm_attribute
        /usr/local/sbin/crm_diff
        /usr/local/sbin/crm_error
        /usr/local/sbin/crm_failcount
        /usr/local/sbin/crm_master
        /usr/local/sbin/crm_mon
        /usr/local/sbin/crm_node
        /usr/local/sbin/crm_report
        /usr/local/sbin/crm_resource
        /usr/local/sbin/crm_rule
        /usr/local/sbin/crm_shadow
        /usr/local/sbin/crm_simulate
        /usr/local/sbin/crm_standby
        /usr/local/sbin/crm_ticket
        /usr/local/sbin/crm_verify
        /usr/local/sbin/crmadmin
        /usr/local/sbin/fence_legacy
        /usr/local/sbin/iso8601
        /usr/local/sbin/pacemaker-remoted
        /usr/local/sbin/pacemaker_remoted
        /usr/local/sbin/pacemakerd
        /usr/local/sbin/stonith_admin

root@node1:~ # pkg info -l corosync2 | grep bin
        /usr/local/bin/corosync-blackbox
        /usr/local/sbin/corosync
        /usr/local/sbin/corosync-cfgtool
        /usr/local/sbin/corosync-cmapctl
        /usr/local/sbin/corosync-cpgtool
        /usr/local/sbin/corosync-keygen
        /usr/local/sbin/corosync-notifyd
        /usr/local/sbin/corosync-quorumtool

root@node1:~ # pkg info -l crmsh | grep bin
        /usr/local/bin/crm

Cluster Initialization

Now we will initialize our FreeBSD cluster.

First we need to make sure that names of the nodes are DNS resolvable.

root@node1:~ # tail -3 /etc/hosts

10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3

root@node2:~ # tail -3 /etc/hosts

10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3

root@node3:~ # tail -3 /etc/hosts

10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3


Now we will generate the Corosync key.

root@node1:~ # corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /usr/local/etc/corosync/authkey.

root@node1:~ # echo $?
0

root@node1:~ # ls -l /usr/local/etc/corosync/authkey
-r--------  1 root  wheel  128 Sep  2 20:37 /usr/local/etc/corosync/authkey

Now the Corosync configuration file. For sure some examples were provided by the package maintainer.

root@node1:~ # pkg info -l corosync2 | grep example
        /usr/local/etc/corosync/corosync.conf.example
        /usr/local/etc/corosync/corosync.conf.example.udpu

We will take the second one as a base for our config.

root@node1:~ # cp /usr/local/etc/corosync/corosync.conf.example.udpu /usr/local/etc/corosync/corosync.conf

root@node1:~ # vi /usr/local/etc/corosync/corosync.conf
               /* LOTS OF EDITS HERE */

root@node1:~ # cat /usr/local/etc/corosync/corosync.conf

totem {
  version: 2
  crypto_cipher: aes256
  crypto_hash: sha256
  transport: udpu

  interface {
    ringnumber: 0
    bindnetaddr: 10.0.10.0
    mcastport: 5405
    ttl: 1
  }
}

logging {
  fileline: off
  to_logfile: yes
  to_syslog: no
  logfile: /var/log/cluster/corosync.log
  debug: off
  timestamp: on

  logger_subsys {
    subsys: QUORUM
    debug: off
  }
}

nodelist {

  node {
    ring0_addr: 10.0.10.111
    nodeid: 1
  }

  node {
    ring0_addr: 10.0.10.112
    nodeid: 2
  }

  node {
    ring0_addr: 10.0.10.113
    nodeid: 3
  }

}

quorum {
  provider: corosync_votequorum
  expected_votes: 2
}

Now we need to propagate both Corosync key and config across the nodes in the cluster.

We can use some simple tools created exactly for that like net/csync2 cluster synchronization tool for example but plain old net/rsync will serve as well.

root@node1:~ # pkg install -y rsync

root@node1:~ # rsync -av /usr/local/etc/corosync/ node2:/usr/local/etc/corosync/
The authenticity of host 'node2 (10.0.10.112)' can't be established.
ECDSA key fingerprint is SHA256:/ZDmln7GKi6n0kbad73TIrajPjGfQqJJX+ReSf3NMvc.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2' (ECDSA) to the list of known hosts.
Password for root@node2:
sending incremental file list
./
authkey
corosync.conf
service.d/
uidgid.d/

sent 1,100 bytes  received 69 bytes  259.78 bytes/sec
total size is 4,398  speedup is 3.76

root@node1:~ # rsync -av /usr/local/etc/corosync/ node3:/usr/local/etc/corosync/
The authenticity of host 'node2 (10.0.10.112)' can't be established.
ECDSA key fingerprint is SHA256:/ZDmln7GKi6n0kbad73TIrajPjGfQqJJX+ReSf3NMvc.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3' (ECDSA) to the list of known hosts.
Password for root@node3:
sending incremental file list
./
authkey
corosync.conf
service.d/
uidgid.d/

sent 1,100 bytes  received 69 bytes  259.78 bytes/sec
total size is 4,398  speedup is 3.76

Now lets check that they are the same.

root@node1:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf

root@node2:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf

root@node3:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf

Same.

We can now add corosync_enable=YES and pacemaker_enable=YES to the /etc/rc.conf file.

root@node1:~ # sysrc corosync_enable=YES
corosync_enable:  -> YES

root@node1:~ # sysrc pacemaker_enable=YES
pacemaker_enable:  -> YES

root@node2:~ # sysrc corosync_enable=YES
corosync_enable:  -> YES

root@node2:~ # sysrc pacemaker_enable=YES
pacemaker_enable:  -> YES

root@node3:~ # sysrc corosync_enable=YES
corosync_enable:  -> YES

root@node3:~ # sysrc pacemaker_enable=YES
pacemaker_enable:  -> YES

Lets start these services then.

root@node1:~ # service corosync start
Starting corosync.
Sep 02 20:55:35 notice  [MAIN  ] Corosync Cluster Engine ('2.4.5'): started and ready to provide service.
Sep 02 20:55:35 info    [MAIN  ] Corosync built-in features:
Sep 02 20:55:35 warning [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Sep 02 20:55:35 warning [MAIN  ] Please migrate config file to nodelist.

root@node1:~ # ps aux | grep corosync
root  1695   0.0  7.9 38340 38516  -  S    20:55    0:00.40 /usr/local/sbin/corosync
root  1699   0.0  0.1   524   336  0  R+   20:57    0:00.00 grep corosync

Do the same on the node2 and node3 systems.

The Pacemaker is not yet running so that will fail.

root@node1:~ # crm status
Could not connect to the CIB: Socket is not connected
crm_mon: Error: cluster is not available on this node
ERROR: status: crm_mon (rc=102): 

We will start it now.

root@node1:~ # service pacemaker start
Starting pacemaker.

root@node2:~ # service pacemaker start
Starting pacemaker.

root@node3:~ # service pacemaker start
Starting pacemaker.

You need to give it little time to start because if you will execute crm status command right away you will get 0 nodes configured message as shown below.

root@node1:~ # crm status
Cluster Summary:
  * Stack: unknown
  * Current DC: NONE
  * Last updated: Wed Sep  2 20:58:51 2020
  * Last change:  
  * 0 nodes configured
  * 0 resource instances configured


Full List of Resources:
  * No resources

… but after a while everything is detected and works as desired.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 21:02:49 2020
  * Last change:  Wed Sep  2 20:59:00 2020 by hacluster via crmd on node2
  * 3 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * No resources

The Pacemaker runs properly.

root@node1:~ # ps aux | grep pacemaker
root      1716   0.0  0.5 10844   2396  -  Is   20:58     0:00.00 daemon: /usr/local/sbin/pacemakerd[1717] (daemon)
root      1717   0.0  5.2 49264  25284  -  S    20:58     0:00.27 /usr/local/sbin/pacemakerd
hacluster 1718   0.0  6.1 48736  29708  -  Ss   20:58     0:00.75 /usr/local/libexec/pacemaker/pacemaker-based
root      1719   0.0  4.5 40628  21984  -  Ss   20:58     0:00.28 /usr/local/libexec/pacemaker/pacemaker-fenced
root      1720   0.0  2.8 25204  13688  -  Ss   20:58     0:00.20 /usr/local/libexec/pacemaker/pacemaker-execd
hacluster 1721   0.0  3.9 38148  19100  -  Ss   20:58     0:00.25 /usr/local/libexec/pacemaker/pacemaker-attrd
hacluster 1722   0.0  2.9 25460  13864  -  Ss   20:58     0:00.17 /usr/local/libexec/pacemaker/pacemaker-schedulerd
hacluster 1723   0.0  5.4 49304  26300  -  Ss   20:58     0:00.41 /usr/local/libexec/pacemaker/pacemaker-controld
root      1889   0.0  0.6 11348   2728  0  S+   21:56     0:00.00 grep pacemaker

We can check how Corosync sees its members.

root@node1:~ # corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.10.111) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.10.112) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(10.0.10.113) 
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.3.status (str) = joined

… or the quorum information.

root@node1:~ # corosync-quorumtool
Quorum information
------------------
Date:             Wed Sep  2 21:00:38 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1
Ring ID:          1/12
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
         1          1 10.0.10.111 (local)
         2          1 10.0.10.112
         3          1 10.0.10.113

The Corosync log file is filled with the following information.

root@node1:~ # cat /var/log/cluster/corosync.log
Sep 02 20:55:35 [1694] node1 corosync notice  [MAIN  ] Corosync Cluster Engine ('2.4.5'): started and ready to provide service.
Sep 02 20:55:35 [1694] node1 corosync info    [MAIN  ] Corosync built-in features:
Sep 02 20:55:35 [1694] node1 corosync warning [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Sep 02 20:55:35 [1694] node1 corosync warning [MAIN  ] Please migrate config file to nodelist.
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] Initializing transport (UDP/IP Unicast).
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha256
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] The network interface [10.0.10.111] is now up.
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: cmap
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync configuration service [1]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: cfg
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: cpg
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
Sep 02 20:55:35 [1694] node1 corosync notice  [QUORUM] Using quorum provider corosync_votequorum
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: votequorum
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: quorum
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] adding new UDPU member {10.0.10.111}
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] adding new UDPU member {10.0.10.112}
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] adding new UDPU member {10.0.10.113}
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] A new membership (10.0.10.111:4) was formed. Members joined: 1
Sep 02 20:55:35 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:55:35 [1694] node1 corosync notice  [QUORUM] Members[1]: 1
Sep 02 20:55:35 [1694] node1 corosync notice  [MAIN  ] Completed service synchronization, ready to provide service.
Sep 02 20:58:14 [1694] node1 corosync notice  [TOTEM ] A new membership (10.0.10.111:8) was formed. Members joined: 2
Sep 02 20:58:14 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:14 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:14 [1694] node1 corosync notice  [QUORUM] This node is within the primary component and will provide service.
Sep 02 20:58:14 [1694] node1 corosync notice  [QUORUM] Members[2]: 1 2
Sep 02 20:58:14 [1694] node1 corosync notice  [MAIN  ] Completed service synchronization, ready to provide service.
Sep 02 20:58:19 [1694] node1 corosync notice  [TOTEM ] A new membership (10.0.10.111:12) was formed. Members joined: 3
Sep 02 20:58:19 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync notice  [QUORUM] Members[3]: 1 2 3
Sep 02 20:58:19 [1694] node1 corosync notice  [MAIN  ] Completed service synchronization, ready to provide service.

Here is the configuration.

root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.4-2deceaa3ae \
        cluster-infrastructure=corosync

As we will not be configuring the STONITH mechanism we will disable it.

root@node1:~ # crm configure property stonith-enabled=false

New configuraion with STONITH disabled.

root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.4-2deceaa3ae \
        cluster-infrastructure=corosync \
        stonith-enabled=false

The STONITH configuration is out of scope of this article but properly configured STONITH looks like that.

stonith

First Service

We will now configure our first highly available service – a classic – a floating IP address ๐Ÿ™‚

root@node1:~ # crm configure primitive IP ocf:heartbeat:IPaddr2 params ip=10.0.10.200 cidr_netmask="24" op monitor interval="30s"

Lets check how it behaves.

root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
primitive IP IPaddr2 \
        params ip=10.0.10.200 cidr_netmask=24 \
        op monitor interval=30s
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.4-2deceaa3ae \
        cluster-infrastructure=corosync \
        stonith-enabled=false

Looks good – lets check the cluster status.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:03:35 2020
  * Last change:  Wed Sep  2 22:02:53 2020 by root via cibadmin on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::heartbeat:IPaddr2):        Stopped

Failed Resource Actions:
  * IP_monitor_0 on node3 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:53Z', queued=0ms, exec=132ms
  * IP_monitor_0 on node2 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:54Z', queued=0ms, exec=120ms
  * IP_monitor_0 on node1 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:53Z', queued=0ms, exec=110ms

Crap. Linuxism. The ip(8) command is expected to be present in the system. This is FreeBSD and as any UNIX system it comes with ifconfig(8) command instead.

We will have to figure something else. For now we will delete our useless IP service.

root@node1:~ # crm configure delete IP

Status after deletion.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:04:34 2020
  * Last change:  Wed Sep  2 22:04:31 2020 by root via cibadmin on node1
  * 3 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * No resources

Custom Resource

Lets check what resources are available by stock Pacemaker installation.

root@node1:~ # ls -l /usr/local/lib/ocf/resource.d/pacemaker
total 144
-r-xr-xr-x  1 root  wheel   7484 Aug 29 01:22 ClusterMon
-r-xr-xr-x  1 root  wheel   9432 Aug 29 01:22 Dummy
-r-xr-xr-x  1 root  wheel   5256 Aug 29 01:22 HealthCPU
-r-xr-xr-x  1 root  wheel   5342 Aug 29 01:22 HealthIOWait
-r-xr-xr-x  1 root  wheel   9450 Aug 29 01:22 HealthSMART
-r-xr-xr-x  1 root  wheel   6186 Aug 29 01:22 Stateful
-r-xr-xr-x  1 root  wheel  11370 Aug 29 01:22 SysInfo
-r-xr-xr-x  1 root  wheel   5856 Aug 29 01:22 SystemHealth
-r-xr-xr-x  1 root  wheel   7382 Aug 29 01:22 attribute
-r-xr-xr-x  1 root  wheel   7854 Aug 29 01:22 controld
-r-xr-xr-x  1 root  wheel  16134 Aug 29 01:22 ifspeed
-r-xr-xr-x  1 root  wheel  11040 Aug 29 01:22 o2cb
-r-xr-xr-x  1 root  wheel  11696 Aug 29 01:22 ping
-r-xr-xr-x  1 root  wheel   6356 Aug 29 01:22 pingd
-r-xr-xr-x  1 root  wheel   3702 Aug 29 01:22 remote

Not many … we will try to modify the Dummy service into an IP changer on FreeBSD.

root@node1:~ # cp /usr/local/lib/ocf/resource.d/pacemaker/Dummy /usr/local/lib/ocf/resource.d/pacemaker/ifconfig

root@node1:~ # vi /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
               /* LOTS OF TYPING */

Because of the WordPress blogging system limitations I am forced to post this ifconfig resource as an image … but fear not – the text version is also available here – ifconfig.odt – for download.

Also the first version did not went that well …

root@node1:~ # setenv OCF_ROOT /usr/local/lib/ocf
root@node1:~ # ocf-tester -n resourcename /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
Beginning tests for /usr/local/lib/ocf/resource.d/pacemaker/ifconfig...
* rc=3: Your agent has too restrictive permissions: should be 755
-:1: parser error : Start tag expected, '<' not found
usage: /usr/local/lib/ocf/resource.d/pacemaker/ifconfig {start|stop|monitor}
^
* rc=1: Your agent produces meta-data which does not conform to ra-api-1.dtd
* rc=3: Your agent does not support the meta-data action
* rc=3: Your agent does not support the validate-all action
* rc=0: Monitoring a stopped resource should return 7
* rc=0: The initial probe for a stopped resource should return 7 or 5 even if all binaries are missing
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* Your agent does not support the reload action (optional)
Tests failed: /usr/local/lib/ocf/resource.d/pacemaker/ifconfig failed 9 tests

But after adding 755 mode to it and making several (hundred) changes it become usable.

root@node1:~ # vi /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
             /* LOTS OF NERVOUS TYPING */
root@node1:~ # chmod 755 /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
root@node1:~ # setenv OCF_ROOT /usr/local/lib/ocf
root@node1:~ # ocf-tester -n resourcename /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
Beginning tests for /usr/local/lib/ocf/resource.d/pacemaker/ifconfig...
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
/usr/local/lib/ocf/resource.d/pacemaker/ifconfig passed all tests

Looks usable.

The ifconfig resource. Its pretty limited and with hardcoded IP address as for now.

ifconfig

Lets try to add new IP resource to our FreeBSD cluster.

Tests

root@node1:~ # crm configure primitive IP ocf:pacemaker:ifconfig op monitor interval="30"

Added.

Lets see what status command now shows.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:44:52 2020
  * Last change:  Wed Sep  2 22:44:44 2020 by root via cibadmin on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node1

Failed Resource Actions:
  * IP_monitor_0 on node3 'not installed' (5): call=24, status='Not installed', exitreason='', last-rc-change='2020-09-02 22:42:52Z', queued=0ms, exec=5ms
  * IP_monitor_0 on node2 'not installed' (5): call=24, status='Not installed', exitreason='', last-rc-change='2020-09-02 22:42:53Z', queued=0ms, exec=2ms

Crap. I forgot to copy this new ifconfig resource to the other nodes. Lets fix that now.

root@node1:~ # rsync -av /usr/local/lib/ocf/resource.d/pacemaker/ node2:/usr/local/lib/ocf/resource.d/pacemaker/
Password for root@node2:
sending incremental file list
./
ifconfig

sent 3,798 bytes  received 38 bytes  1,534.40 bytes/sec
total size is 128,003  speedup is 33.37

root@node1:~ # rsync -av /usr/local/lib/ocf/resource.d/pacemaker/ node3:/usr/local/lib/ocf/resource.d/pacemaker/
Password for root@node3:
sending incremental file list
./
ifconfig

sent 3,798 bytes  received 38 bytes  1,534.40 bytes/sec
total size is 128,003  speedup is 33.37

Lets stop, delete and re-add our precious resource now.

root@node1:~ # crm resource stop IP
root@node1:~ # crm configure delete IP
root@node1:~ # crm configure primitive IP ocf:pacemaker:ifconfig op monitor interval="30"

Fingers crossed.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:45:46 2020
  * Last change:  Wed Sep  2 22:45:43 2020 by root via cibadmin on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node1

Looks like running properly.

Lets verify that its really up where it should be.

root@node1:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:2a:78:60
        inet 10.0.10.111 netmask 0xffffff00 broadcast 10.0.10.255
        inet 10.0.10.200 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

root@node2:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:80:50:05
        inet 10.0.10.112 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

root@node3:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:74:5e:b9
        inet 10.0.10.113 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

Seems to be working.

Now lets try to move it to the other node in the cluster.

root@node1:~ # crm resource move IP node3
INFO: Move constraint created for IP to node3

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:47:31 2020
  * Last change:  Wed Sep  2 22:47:28 2020 by root via crm_resource on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node3

Switched properly to node3 system.

root@node3:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:74:5e:b9
        inet 10.0.10.113 netmask 0xffffff00 broadcast 10.0.10.255
        inet 10.0.10.200 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

root@node1:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:2a:78:60
        inet 10.0.10.111 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

Now we will poweroff the node3 system to check it that IP is really highly available.

root@node2:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:49:57 2020
  * Last change:  Wed Sep  2 22:47:29 2020 by root via crm_resource on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node3

root@node3:~ # poweroff

root@node2:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:50:16 2020
  * Last change:  Wed Sep  2 22:47:29 2020 by root via crm_resource on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 ]
  * OFFLINE: [ node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node1

Seems that failover went well.

The crm command also colors various sections of its output.

failover

Good to know that Pacemaker and Corosync cluster runs well on FreeBSD.

Some work is needed to write the needed resource files but one with some time and determination can surely put FreeBSD into a very capable highly available cluster.

EOF

FreeBSD Desktop โ€“ Part 21 โ€“ Configuration โ€“ Compton

In this article of the FreeBSD Desktop series I will talk Compton setup – the one that does not breaks, displays everything properly and does not consume 100% of your CPU time, as unfortunately Compton is a real bitch when it comes to proper setup.

The Compton is X11 compositor.

It allows the following features on X11 desktop:

  • transparent windows/menus/titlebars/borders
  • shadows and colored shadows
  • fading effects
  • background bluring

You may want to check other articles in the FreeBSD Desktop series on the FreeBSD Desktop – Global Page where you will find links to all episodes of the series along with table of contents for each episode’s contents.

Here is how example Compton looks in action.

compton

To install Compton on FreeBSD just use the default packages as shown below.

# pkg install compton

X11 Configuration

This is the graphics card configuration I have for X11:

% cat /usr/local/etc/X11/xorg.conf.d/card.conf
Section "Device"
  Identifier "Card0"
  Driver "modesetting"
  Option "DPMS"
  Option "AccelMethod" "glamor"
EndSection

… and the meritum of this article – the Compton config file:

% cat ~/.config/compton.conf
backend = "glx";
shadow = true;
no-dock-shadow = true;
clear-shadow = true;
shadow-radius = 12;
shadow-offset-x = -15;
shadow-offset-y = -15;
shadow-opacity = 0.7;
shadow-exclude = [
    "! name~=''",
    "name = 'Notification'",
    "name = 'Plank'",
    "name = 'Docky'",
    "name = 'Kupfer'",
    "name = 'xfce4-notifyd'",
    "name *= 'VLC'",
    "name *= 'compton'",
    "name *= 'Chromium'",
    "name *= 'Chrome'",
    "name *= 'Firefox'",
    "class_g = 'Conky'",
    "class_g = 'dzen'",
    "class_g = 'dzen2'",
    "class_g = 'Kupfer'",
    "class_g = 'Synapse'",
    "class_g ?= 'Notify-osd'",
    "class_g ?= 'Cairo-dock'",
    "class_g ?= 'Xfce4-notifyd'",
    "class_g ?= 'Xfce4-power-manager'"
];
shadow-ignore-shaped = false;
menu-opacity = 1;
inactive-opacity = 0.9;
active-opacity = 1;
frame-opacity = 0.9;
inactive-opacity-override = false;
alpha-step = 0.06;
blur-background-fixed = false;
blur-background-exclude = [
    "window_type = 'dock'",
    "window_type = 'desktop'"
];
fading = true;
fade-delta = 4;
fade-in-step = 0.03;
fade-out-step = 0.03;
fade-exclude = [ ];
mark-wmwin-focused = true;
mark-ovredir-focused = true;
use-ewmh-active-win = true;
detect-rounded-corners = true;
detect-client-opacity = true;
refresh-rate = 0;
vsync = "opengl-swc";
dbe = false;
paint-on-overlay = true;
sw-opti = false;
unredir-if-possible = true;
focus-exclude = [ ];
detect-transient = true;
detect-client-leader = true;
wintypes:
{
    tooltip =
    {
        fade = true;
        shadow = false;
        opacity = 0.85;
        focus = true;
    };
};

While the above config works very well I will also add same Compton configuration file but with comments.

% cat ~/.config/compton.conf
#################################
#
# Backend
#
#################################

# Backend to use: "xrender" or "glx".
# GLX backend is typically much faster but depends on a sane driver.
backend = "glx";

#################################
#
# GLX Backend
#
#################################

# GLX backend: Copy unmodified regions from front buffer instead of redrawing them all.
# Tests with nvidia-drivers show 10% decrease in performance when whole screen
# is modified but 20% increase when only 1/4 is modified.
# Tests on nouveau show terrible slowdown.
# Useful with --glx-swap-method as well.
# glx-copy-from-front = false;

# GLX backend: Use MESA_copy_sub_buffer to do partial screen update.
# Tests on nouveau shows 200% performance boost when only 1/4 of screen is updated.
# May break VSync and is not available on some drivers.
# Overrides --glx-copy-from-front.
# glx-use-copysubbuffermesa = true;

# GLX backend: Avoid rebinding pixmap on window damage.
# Probably could improve performance on rapid window content changes
# but is known to break things on some drivers (LLVMpipe).
# Recommended if it works.
# glx-no-rebind-pixmap = true;

# GLX backend: GLX buffer swap method we assume.
# Could be:
# - undefined (0)
# - copy (1)
# - exchange (2)
# - buffer-age (-1)
# The undefined is slowest and safest (default value).
# Copy is fastest but may fail on some drivers.
# buffer-age means auto-detect using GLX_EXT_buffer_age supported by some drivers.
# Useless with --glx-use-copysubbuffermesa.
# Partially breaks --resize-damage.
# Defaults to undefined.
# glx-swap-method = "undefined";

#################################
#
# Shadows
#
#################################

# Enabled client-side shadows on windows.
shadow = true;

# Do not draw shadows on DND windows.
# no-dnd-shadow = true;

# Avoid drawing shadows on dock/panel windows.
no-dock-shadow = true;

# Zero part of shadow's mask behind window. Fix some weirdness with ARGB windows.
clear-shadow = true;

# The blur radius for shadows. (default 12)
shadow-radius = 12;

# The left offset for shadows. (default -15)
shadow-offset-x = -15;

# The top offset for shadows. (default -15)
shadow-offset-y = -15;

# The translucency for shadows. (default .75)
shadow-opacity = 0.7;

# Set if you want different colour shadows
# shadow-red = 0.0;
# shadow-green = 0.0;
# shadow-blue = 0.0;

# The shadow exclude options are helpful if you have shadows enabled.
# Due to way compton draws its shadows certain applications will have
# visual glitches (most applications are fine - only apps that do weird
# things with xshapes or argb are affected).
# The "! name~=''" part excludes shadows on any "Unknown" windows.
# This prevents visual glitch with XFWM alt-tab switcher.
shadow-exclude = [
    "! name~=''",
    "name = 'Notification'",
    "name = 'Plank'",
    "name = 'Docky'",
    "name = 'Kupfer'",
    "name = 'xfce4-notifyd'",
    "name *= 'VLC'",
    "name *= 'compton'",
    "name *= 'Chromium'",
    "name *= 'Chrome'",
    "name *= 'Firefox'",
    "class_g = 'Conky'",
    "class_g = 'dzen'",
    "class_g = 'dzen2'",
    "class_g = 'Kupfer'",
    "class_g = 'Synapse'",
    "class_g ?= 'Notify-osd'",
    "class_g ?= 'Cairo-dock'",
    "class_g ?= 'Xfce4-notifyd'",
    "class_g ?= 'Xfce4-power-manager'"
];

# Avoid drawing shadow on all shaped windows (see also: --detect-rounded-corners)
shadow-ignore-shaped = false;

#################################
#
# Opacity
#
#################################

# Opacity for menu items.
menu-opacity = 1;

# Opacity for inactive windows.
inactive-opacity = 0.9;

# Opacity for active windows.
active-opacity = 1;

# Opacity for active frame of windows.
frame-opacity = 0.9;

# Opacity for inactive frame of windows.
inactive-opacity-override = false;

# Alpha step.
alpha-step = 0.06;

# Dim inactive windows. (0.0 - 1.0)
# inactive-dim = 0.2;

# Do not let dimness adjust based on window opacity.
# inactive-dim-fixed = true;

# Blur background of transparent windows. Bad performance with X Render backend.
# GLX backend is preferred.
# blur-background = true;

# Blur background of opaque windows with transparent frames as well.
# blur-background-frame = true;

# Do not let blur radius adjust based on window opacity.
blur-background-fixed = false;

# Blue exclude list.
blur-background-exclude = [
    "window_type = 'dock'",
    "window_type = 'desktop'"
];

#################################
#
# Fading
#
#################################

# Fade windows during opacity changes.
fading = true;

# The time between steps in fade in milliseconds (default 10).
fade-delta = 4;

# Opacity change between steps while fading in (default 0.028).
fade-in-step = 0.03;

# Opacity change between steps while fading out (default 0.03).
fade-out-step = 0.03;

# Fade windows in/out when opening/closing
# no-fading-openclose = true;

# Specify a list of conditions of windows that should not be faded.
fade-exclude = [ ];

#################################
#
# Other
#
#################################

# Try to detect WM windows and mark them as active.
mark-wmwin-focused = true;

# Mark all non-WM but override-redirect windows active (e.g. menus).
mark-ovredir-focused = true;

# Use EWMH _NET_WM_ACTIVE_WINDOW to determine which window is focused instead of
# using FocusIn/Out events. Usually more reliable but depends on EWMH-compliant WM.
use-ewmh-active-win = true;

# Detect rounded corners and treat them as rectangular when --shadow-ignore-shaped is on.
detect-rounded-corners = true;

# Detect _NET_WM_OPACITY on client windows useful for window managers not passing
# _NET_WM_OPACITY of client windows to frame windows. This prevents opacity ignore
# for some apps. Without this enabled xfce4-notifyd is 100% opacity no matter what.
detect-client-opacity = true;

# Specify refresh rate. With 0 compton will detect this with X RandR extension.
refresh-rate = 0;

# Set VSync method. VSync methods currently available:
# - none: No VSync
# - drm: VSync with DRM_IOCTL_WAIT_VBLANK. May only work on some drivers.
# - opengl: VSync with SGI_video_sync OpenGL extension. Only on some drivers.
# - opengl-oml: VSync with OML_sync_control OpenGL extension. Only on some drivers.
# - opengl-swc: VSync with SGI_swap_control OpenGL extension. Only on some drivers.
#               Works with GLX backend. Known to be most effective on many drivers.
#               Does not control paint timing - only buffer swap is affected.
#               Does not have effect of --sw-opti unlike other methods. Experimental.
# - opengl-mswc: Try to VSync with MESA_swap_control OpenGL extension.
#                Basically same as opengl-swc above except extension we use.
vsync = "opengl-swc";

# Enable DBE painting mode - use with VSync to (hopefully) eliminate tearing.
dbe = false;

# Painting on X Composite overlay window. Recommended.
paint-on-overlay = true;

# Limit repaint at most once every 1 / refresh_rate second to boost performance.
# This should not be used with --vsync drm/opengl/opengl-oml as they essentially does
# --sw-opti* job unless you wish to have lower refresh rate than actual value.
sw-opti = false;

# Unredirect all windows if full-screen window is detected to maximize performance
# for full-screen windows - like games. Known to cause flickering when
# redirecting/unredirecting windows. Paint-on-overlay may flicker less.
unredir-if-possible = true;

# Specify list of conditions of windows that should always be considered focused.
focus-exclude = [ ];

# Use WM_TRANSIENT_FOR to group windows in same group focused at same time.
detect-transient = true;

# Use WM_CLIENT_LEADER to group windows in same group focused at same time.
# WM_TRANSIENT_FOR has higher priority if --detect-transient is enabled too.
detect-client-leader = true;

#################################
#
# Window Type Settings
#
#################################

wintypes:
{
    tooltip =
    {
        # fade: Fade particular type of windows.
        fade = true;
        # shadow: Give those windows shadow
        shadow = false;
        # opacity: Default opacity for type of windows.
        opacity = 0.85;
        # focus: Whether to always consider windows of this type focused.
        focus = true;
    };
};


Not sure what else could I add here so this means the end of this article ๐Ÿ™‚

EOF

FreeBSD Enterprise Storage at PBUG

Yesterday I was honored to give a talk about FreeBSD Enterprise Storage at the Polish BSD User Group meeting.

You are invited to download the PDF Slides โ€“ https://is.gd/bsdstg โ€“ available here.

bsdstg

The PBUG (Polish BSD User Group) meetings are very special. In “The Matrix” movie (which has been rendered on FreeBSD system by the way) – FreeBSD Used to Generate Spectacular Special Effects – details available here – its not possible to describe what the Matrix really is, one has to feel it. Enter it. The same I can tell you about the PBUG meetings. Its kinda like with the “Hangover” movie. What happens in Vegas PBUG meeting stays in Vegas PBUG meeting ๐Ÿ™‚

If you will have the possibility and time then join the next Polish BSD User Group meeting. You will not regret it :>

UPDATE 1 – Shorter Unified Version

The original – https://is.gd/bsdstg – presentation is 187 pages long and is suited for live presentation while not the best for later ‘offline’ view.

I have created a unified version – https://is.gd/bsdstguni – with only 42 pages.

EOF

Run broot on FreeBSD

The broot file manager is quite fresh and nice approach to files and directories filtering/searching/view/manipulation/… and whatever else you call messing with files ๐Ÿ™‚

The broot tools is not yet available on the FreeBSD systems (as package or port).

This guide will show you how to built and install it on your FreeBSD system.

Here is how it looks in action.

Filter for jails.

broot-filter-jails.jpg

Filter for zfs.

broot-filter-zfs.jpg

It has ‘size mode’ when started with -s option similar to ncdu(1) tool.

broot-filter-size.jpg

You can also check the Feature Showcase section on their GitHub page – https://github.com/Canop/broot – available here.

Build

There are three steps to make it happen.

1. You need to install the rust package.

# pkg install rust

Then you need to type (as regular user) the cargo install broot command.

% cargo install broot

It will fail here:

broot-fail.jpg

You will need to apply this patch below:

% diff -u \
  /home/vermaden/.cargo/registry/src/github.com-1ecc6299db9ec823/crossterm-0.14.1/src/terminal/sys/unix.rs.ORG \
  /home/vermaden/.cargo/registry/src/github.com-1ecc6299db9ec823/crossterm-0.14.1/src/terminal/sys/unix.rs
--- /home/vermaden/.cargo/registry/src/github.com-1ecc6299db9ec823/crossterm-0.14.1/src/terminal/sys/unix.rs.ORG  2020-01-10 23:41:29.825912000 +0100
+++ /home/vermaden/.cargo/registry/src/github.com-1ecc6299db9ec823/crossterm-0.14.1/src/terminal/sys/unix.rs      2020-01-10 23:41:07.703471000 +0100
@@ -33,7 +33,7 @@
         ws_ypixel: 0,
     };
 
-    if let Ok(true) = wrap_with_result(unsafe { ioctl(STDOUT_FILENO, TIOCGWINSZ, &mut size) }) {
+    if let Ok(true) = wrap_with_result(unsafe { ioctl(STDOUT_FILENO, TIOCGWINSZ.into(), &mut size) }) {
         Ok((size.ws_col, size.ws_row))
     } else {
         tput_size().ok_or_else(|| std::io::Error::last_os_error().into())

Then type cargo install broot command again. It will now properly compile.

% cargo install broot
    Updating crates.io index
  Downloaded broot v0.11.6
  Downloaded 1 crate (1.6 MB) in 2.89s
  Installing broot v0.11.6
   Compiling libc v0.2.66
   Compiling cfg-if v0.1.10
   Compiling lazy_static v1.4.0
   Compiling autocfg v0.1.7
   Compiling semver-parser v0.7.0
   Compiling autocfg v1.0.0
   Compiling proc-macro2 v1.0.7
   Compiling log v0.4.8
   Compiling scopeguard v1.0.0
   Compiling unicode-xid v0.2.0
   Compiling bitflags v1.2.1
   Compiling syn v1.0.13
   Compiling memchr v2.2.1
   Compiling arc-swap v0.4.4
   Compiling slab v0.4.2
   Compiling smallvec v1.1.0
   Compiling serde v1.0.104
   Compiling unicode-width v0.1.7
   Compiling regex-syntax v0.6.13
   Compiling ansi_term v0.11.0
   Compiling strsim v0.8.0
   Compiling vec_map v0.8.1
   Compiling id-arena v2.2.1
   Compiling custom_error v1.7.1
   Compiling glob v0.3.0
   Compiling open v1.3.2
   Compiling umask v0.1.8
   Compiling thread_local v1.0.0
   Compiling minimad v0.6.3
   Compiling lazy-regex v0.1.2
   Compiling semver v0.9.0
   Compiling lock_api v0.3.3
   Compiling crossbeam-utils v0.7.0
   Compiling crossbeam-epoch v0.8.0
   Compiling num-traits v0.2.11
   Compiling num-integer v0.1.42
   Compiling textwrap v0.11.0
   Compiling rustc_version v0.2.3
   Compiling memoffset v0.5.3
   Compiling iovec v0.1.4
   Compiling net2 v0.2.33
   Compiling dirs-sys v0.3.4
   Compiling parking_lot_core v0.7.0
   Compiling signal-hook-registry v1.2.0
   Compiling time v0.1.42
   Compiling atty v0.2.14
   Compiling users v0.9.1
   Compiling quote v1.0.2
   Compiling aho-corasick v0.7.6
   Compiling mio v0.6.21
   Compiling dirs v2.0.2
   Compiling directories v2.0.2
   Compiling parking_lot v0.10.0
   Compiling clap v2.33.0
   Compiling crossbeam-queue v0.2.1
   Compiling crossbeam-channel v0.4.0
   Compiling toml v0.5.5
   Compiling term v0.6.1
   Compiling regex v1.3.3
   Compiling signal-hook v0.1.12
   Compiling chrono v0.4.10
   Compiling crossterm v0.14.1
   Compiling simplelog v0.7.4
   Compiling crossbeam-deque v0.7.2
   Compiling thiserror-impl v1.0.9
   Compiling crossbeam v0.7.3
   Compiling thiserror v1.0.9
   Compiling termimad v0.8.9
   Compiling broot v0.11.6
    Finished release [optimized] target(s) in 4m 56s
  Installing /home/vermaden/.cargo/bin/broot
   Installed package `broot v0.11.6` (executable `broot`)
warning: be sure to add `/home/vermaden/.cargo/bin` to your PATH to be able to run the installed binaries

% echo $?
0

Install

Now go to the ~/.cargo/bin directory and copy the broot binary to some place that is set in your ${PATH} variable.

Then start new terminal (updated ${PATH} variable) and type broot command.

% cp ~/.cargo/bin/broot ~/scripts
% rehash
% broot

You will be asked if automatic setup of the br function should tool place. I agreed with y answer.

broot-first-run.jpg

Here are things generated by this process.

% find ~/.config/broot
/home/vermaden/.config/broot
/home/vermaden/.config/broot/conf.toml
/home/vermaden/.config/broot/launcher
/home/vermaden/.config/broot/launcher/installed-v1
/home/vermaden/.config/broot/launcher/bash
/home/vermaden/.config/broot/launcher/bash/br

% find ~/.local/share/broot
/home/vermaden/.local/share/broot
/home/vermaden/.local/share/broot/launcher
/home/vermaden/.local/share/broot/launcher/fish
/home/vermaden/.local/share/broot/launcher/fish/1.fish
/home/vermaden/.local/share/broot/launcher/bash
/home/vermaden/.local/share/broot/launcher/bash/1

As I use ZSH shell it also updates my ~/.zshrc file.

% tail -3 ~/.zshrc

source /home/vermaden/.config/broot/launcher/bash/br

Finished. You now have broot installed and ready to use.

broot-filter-bhyve.jpg

UPDATE 1 – Now No Patches Are Needed

Thanks to the broot author any patches are now not needed.

It builds and works out of the box.

broot-update-fixed

UPDATE 2 – Its in Ports/Packages Now

The broot file manager is now available via usual FreeBSD Ports and packages which makes this guide pointless ๐Ÿ™‚

Its available as misc/broot port.

EOF

ย 

FreeBSD Desktop – Part 20 – Configuration – Unlock Your Laptop with Phone

I really do not like the smart card ecosystem – probably because it will be a big PITA to setup such subsystem on FreeBSD to make it lock/unlock my laptop with a smart card – not to mention of it will be even possible because of probable lack of drivers for a laptop builtin smart card reader. I mention it because you can lock and unlock your laptop with such smart card in very fast way.

Some people use finger prints readers (for fast workstation/laptop unlock purpose) – but its the same case scenario as with smart card – the time needed to setup it properly. Not to mention that is not that fast anyway as I often see my colleagues swinging the finger over the fingerprint reader over and over again so it will finally work the 7th time …

… but you wan also lock and unlock your UNIX laptop with your phone – by just attaching it to your device – this is where the FreeBSD’s devd(8) subsystem come handy.

Today I will show you how to lock/unlock your laptop with your phone.

You may want to check other articles in the FreeBSD Desktop series on the FreeBSD Desktop – Global Page where you will find links to all episodes of the series along with table of contents for each episodeโ€™s contents.

Keep in mind that in order to make it work you need to attach the phone to laptop using cable that supports data transfer – it will not work with cables that only provide power for charging your phone.

Device Detection

First we need to detect what device will be your locker/unlocker.

Stop the devd(8) daemon.

# service devd stop
Stopping devd.
Waiting for PIDS: 71455.

Now start it in ‘foreground’ for debug purposes and then attach your phone. The command below with grep(1) will help you to find needed information.

# devd -d 2>&1 | grep --line-buffered 'Processing event' | grep --line-buffered DEVICE
Processing event '!system=USB subsystem=DEVICE type=ATTACH ugen=ugen2.3 cdev=ugen2.3 vendor=0x04e8 product=0x6860 devclass=0x00 devsubclass=0x00 sernum="31000e243eb5a12e" release=0x0400 mode=host port=2 parent=ugen2.2'

I have highlited the needed information.

Do not stop this process yet.

Now you know which device will be your locker/unlocker and what even the devd(8) daemon gets when you attach your phone.

Things to note hare are:

vendor=0x04e8
product=0x6860
sernum=31000e243eb5a12e

This data above is more then enough to unlock your workstation.

Now detach your phone from the computer. You will see the DETACH even similar to the one below.

Processing event '!system=USB subsystem=DEVICE type=DETACH ugen=ugen2.3 cdev=ugen2.3 vendor=0x04e8 product=0x6860 devclass=0x00 devsubclass=0x00 sernum="31000e243eb5a12e" release=0x0400 mode=host port=2 parent=ugen2.2'

Now you know the event that will be spawned when you detach your phone.

Stop the foreground devd(8) daemon and start the service traditionally.

# devd -d 2>&1 | grep --line-buffered 'Processing event' | grep --line-buffered DEVICE
Processing event '!system=USB subsystem=DEVICE type=ATTACH ugen=ugen2.3 cdev=ugen2.3 vendor=0x04e8 product=0x6860 devclass=0x00 devsubclass=0x00 sernum="31000e243eb5a12e" release=0x0400 mode=host port=2 parent=ugen2.2'
Processing event '!system=USB subsystem=DEVICE type=DETACH ugen=ugen2.3 cdev=ugen2.3 vendor=0x04e8 product=0x6860 devclass=0x00 devsubclass=0x00 sernum="31000e243eb5a12e" release=0x0400 mode=host port=2 parent=ugen2.2'
^C
# service devd start
Starting devd.

Commands for Events

Now, what action or command should be executed when you attach or detach your phone? That depends on which screen locker you are using on your X11 setup.

I for example use the mate-screensaver for this purpose.

The ATTACH event in my case would be to kill the current process mate-screensaver which will unlock the screen and then start it again for the next lock purposes – below is the command that I will run for the ATTACH event.

pkill -9 mate-screensaver && su -l vermaden -c 'env DISPLAY=:0 mate-screensaver' &

The DETACH event will be notifying the mate-screensaver to lock the screen – here is the command that will be used for that purpose.

su -l vermaden -c 'env DISPLAY=:0 mate-screensaver-command --lock' &

Implementation

Here is how the devd(8) config file for my phone would look like.

# cat /usr/local/etc/devd/phonelock.conf

# PHONE ATTACH - UNLOCK
notify 100 {
    match "system" "USB";
    match "subsystem" "DEVICE";
    match "type" "ATTACH";
    match "vendor" "0x04e8";
    match "product" "0x6860";
    match "sernum" "31000e243eb5a12e";
    action "pkill -9 mate-screensaver && su -l vermaden -c 'env DISPLAY=:0 mate-screensaver' &";
};

# PHONE DETACH - LOCK
notify 100 {
    match "system" "USB";
    match "subsystem" "DEVICE";
    match "type" "DETACH";
    match "vendor" "0x04e8";
    match "product" "0x6860";
    match "sernum" "31000e243eb5a12e";
    action "su -l vermaden -c 'env DISPLAY=:0 mate-screensaver-command --lock' &";
};

Now restart the devd(8) daemon so it will read new configuration files.

# service devd restart
Stopping devd.
Waiting for PIDS: 1458.
Starting devd.

Viola! Now you can lock and unlock your screen just by attaching or detaching your phone. I do not have any fancy video on how it behaves but you must trust me that is less then a second to lock and unlock the laptop now – be sure to keep and additional eye on your phone now, as it can unlock the access to all your files now ๐Ÿ™‚

You can of course use any USB device or even network actions – any event that is supported by the devd(8) daemon.

You can of course create such lock/unlock config when you attach/detach your phone and additionally configure power down action when you detach other USB device.

I forgot to mention it, that method does not disables the ‘classic’ password authentication – it just adds automatic screen lock/unlock when you attach your phone – you can still login (unlock) using just password on the mate-screensaver lock screen.

UPDATE 1 – Better devd Sniffing – Better Unlock Method

As oh5nxo from Reddit suggested its not needed to stop devd and start it in ‘debug’ mode – its easier just to attach to its ‘pipe’ with nc(1) tool.

# nc -U /var/run/devd.pipe

There is also no need to kill(1) the mate-screensaver command, its more elegant to just send the mate-screensaver-command --unlock command.

Below is the updated /usr/local/etc/devd/phonelock.conf config file for the devd(8) daemon.

# cat /usr/local/etc/devd/phonelock.conf

# PHONE ATTACH - UNLOCK
notify 100 {
    match "system" "USB";
    match "subsystem" "DEVICE";
    match "type" "ATTACH";
    match "vendor" "0x04e8";
    match "product" "0x6860";
    match "sernum" "33000e343fb4a42d";
    action "su -l vermaden -c 'env DISPLAY=:0 mate-screensaver-command --unlock' &";
};

# PHONE DETACH - LOCK
notify 100 {
    match "system" "USB";
    match "subsystem" "DEVICE";
    match "type" "DETACH";
    match "vendor" "0x04e8";
    match "product" "0x6860";
    match "sernum" "33000e343fb4a42d";
    action "su -l vermaden -c 'env DISPLAY=:0 mate-screensaver-command --lock' &";
};

EOF

Nextcloud 17 on FreeBSD 12.1

Not so long ago – almost 2 years from now – I wrote about setting up Nextcloud 13 on FreeBSD.

Today Nextcloud is at 17 version and the configuration that worked two years ago requires some tweaks.

nextcloud-logo.png

This guide will not cover the same information that is available in earlier Nextcloud 13 on FreeBSD article like settings to run Nextcloud inside FreeBSD Jail. Please refer to that earpier article for these settings.

Today we will use these as backends for Nextcloud 17.

  • PostgreSQL 12
  • PHP 7.3
  • Nginx 1.14 (with php-fpm)
  • Memcached 1.5.19

As Nextcloud in FreeBSD packages comes with MySQL and without PostgreSQL support we will need to build it from source using FreeBSD Ports.

Settings

Let’s fetch the latest FreeBSD Ports tree.

# rm -r /var/db/portsnap
# mkdir /var/db/portsnap
# portsnap auto

Now we need to configure needed options in the /etc/make.conf file.

# cat /etc/make.conf
WRKDIRPREFIX=${PORTSDIR}/obj
DEFAULT_VERSIONS+= php=7.3
DEFAULT_VERSIONS+= pgsql=12
OPTIONS_UNSET+=    MYSQL
OPTIONS_SET+=      PGSQL
OPTIONS_SET+=      IMAGICK
OPTIONS_SET+=      PCNTL
OPTIONS_SET+=      SMB
OPTIONS_SET+=      REDIS


Packages and Ports

First we will add some basic tools and things like PostgreSQL still using FreeBSD packages to save tome time instead of compiling them.

# pkg install \
    sudo \
    portmaster \
    beadm \
    lsblk \
    postgresql12-client \
    postgresql12-server \
    nginx \
    memcached \
    php73-pecl-memcached


Now we will compile Nextcloud and its dependencies using FreeBSD Ports – but with portmaster.

# env BATCH=yes portmaster \
    databases/php73-pdo_pgsql \
    databases/php73-pgsql \
    www/nextcloud 

PostgreSQL

We will now configure the FreeBSD’s Login Class for PostgreSQL database in the /etc/login.conf file.

# cat  /etc/login.conf

postgres:\
        :lang=en_US.UTF-8:\
        :setenv=LC_COLLATE=C:\
        :tc=default:

EOF

# cap_mkdb /etc/login.conf

… and PostgreSQL settings in main FreeBSD’s configuration /etc/rc.conf file.

# grep postgresql /etc/rc.conf
postgresql_enable=YES
postgresql_class=postgres
postgresql_data=/var/db/postgres/data12

Let’s initialize the PostgreSQL database.

# /usr/local/etc/rc.d/postgresql initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  C
  CTYPE:    en_US.UTF-8
  MESSAGES: en_US.UTF-8
  MONETARY: en_US.UTF-8
  NUMERIC:  en_US.UTF-8
  TIME:     en_US.UTF-8
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/db/postgres/data12 ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Europe/Warsaw
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/local/bin/pg_ctl -D /var/db/postgres/data12 -l logfile start


As PostgreSQL database uses 8k blocks let’s set it in ZFS. We could of course create dedicated dataset for this purpose if needed.

# zfs set recordsize=8k zroot/ROOT/default

Now, let’s start the PostgreSQL database.

# /usr/local/etc/rc.d/postgresql start
2019-12-31 11:47:04.918 CET [36089] LOG:  starting PostgreSQL 12.1 on amd64-portbld-freebsd12.0, compiled by FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1), 64-bit
2019-12-31 11:47:04.918 CET [36089] LOG:  listening on IPv6 address "::1", port 5432
2019-12-31 11:47:04.918 CET [36089] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2019-12-31 11:47:04.919 CET [36089] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-12-31 11:47:04.928 CET [36089] LOG:  ending log output to stderr
2019-12-31 11:47:04.928 CET [36089] HINT:  Future log output will go to log destination "syslog".

We will now create PostgreSQL database for our Nextcloud instance.

# psql -hlocalhost -Upostgres
psql (12.1)
Type "help" for help.

postgres=# CREATE USER nextcloud WITH PASSWORD 'NEXTCLOUD_DB_PASSWORD';
CREATE ROLE
postgres=# CREATE DATABASE nextcloud TEMPLATE template0 ENCODING 'UNICODE';
CREATE DATABASE
postgres=# ALTER DATABASE nextcloud OWNER TO nextcloud;
ALTER DATABASE
postgres=# \q

Keep in mind to put something more sophisticated in the NEXTCLOUD_DB_PASSWORD place.

PostgreSQL Cleanup and Indexing Script

Lets automate some PostgreSQL housekeeping.

# mkdir -p /var/db/postgres/bin
# chown postgres /var/db/postgres/bin
# vi /var/db/postgres/bin/vacuum.sh

#! /bin/sh

/usr/local/bin/vacuumdb -az 1> /dev/null 2> /dev/null
/usr/local/bin/reindexdb -a 1> /dev/null 2> /dev/null
/usr/local/bin/reindexdb -s 1> /dev/null 2> /dev/null
:wq

# cat /var/db/postgres/bin/vacuum.sh
#! /bin/sh

/usr/local/bin/vacuumdb -az 1> /dev/null 2> /dev/null
/usr/local/bin/reindexdb -a 1> /dev/null 2> /dev/null
/usr/local/bin/reindexdb -s 1> /dev/null 2> /dev/null

# chown postgres /var/db/postgres/bin/vacuum.sh
# chmod +x /var/db/postgres/bin/vacuum.sh

# su - postgres -c 'crontab -e'
0 0 * * * /var/db/postgres/bin/vacuum.sh
:wq
/tmp/crontab.JMg5BfT5HV: 2 lines, 42 characters.
crontab: installing new crontab

# su - postgres -c 'crontab -l'
0 0 * * * /var/db/postgres/bin/vacuum.sh

# su - postgres -c '/var/db/postgres/bin/vacuum.sh'

Nginx

Now its time for Nginx webserver.

# chown -R www:www /var/log/nginx

# ls -l /var/log/nginx
total 3
-rw-r-----  1 www  www   64 2019.12.31 00:00 access.log
-rw-r-----  1 www  www  133 2019.12.31 00:00 access.log.0.bz2
-rw-r-----  1 www  www   64 2019.12.31 00:00 error.log
-rw-r-----  1 www  www  133 2019.12.31 00:00 error.log.0.bz2

… and its main nginx.conf configuration file.

# cat /usr/local/etc/nginx/nginx.conf
user www;
worker_processes 4;
worker_rlimit_nofile 51200;
error_log /var/log/nginx/error.log;

events {
  worker_connections 1024;
}

http {
  include mime.types;
  default_type application/octet-stream;
  log_format main '$remote_addr - $remote_user [$time_local] "$request" ';
  access_log /var/log/nginx/access.log main;
  sendfile on;
  keepalive_timeout 65;

  upstream php-handler {
    server 127.0.0.1:9000;
  }

  server {
    # ENFORCE HTTPS
    listen 80;
    server_name nextcloud.domain.com;
    return 301 https://$server_name$request_uri;
  }

  server {
    listen 443 ssl http2;
    server_name nextcloud.domain.com;
    ssl_certificate /usr/local/etc/nginx/ssl/ssl-bundle.crt;
    ssl_certificate_key /usr/local/etc/nginx/ssl/server.key;

    # HEADERS SECURITY RELATED
    add_header Strict-Transport-Security "max-age=15768000; includeSubDomains; preload;";
    add_header Referrer-Policy "no-referrer";

    # HEADERS
    add_header X-Content-Type-Options nosniff;
    add_header X-XSS-Protection "1; mode=block";
    add_header X-Robots-Tag none;
    add_header X-Download-Options noopen;
    add_header X-Permitted-Cross-Domain-Policies none;

    # PATH TO THE ROOT OF YOUR INSTALLATION
    root /usr/local/www/nextcloud/;

    location = /robots.txt {
      allow all;
      log_not_found off;
      access_log off;
    }

    location = /.well-known/carddav {
      return 301 $scheme://$host/remote.php/dav;
    }

    location = /.well-known/caldav {
      return 301 $scheme://$host/remote.php/dav;
    }

    # BUFFERS TIMEOUTS UPLOAD SIZES
    client_max_body_size 16400M;
    client_body_buffer_size 1048576k;
    send_timeout 3000;

    # ENABLE GZIP BUT DO NOT REMOVE ETag HEADERS
    gzip on;
    gzip_vary on;
    gzip_comp_level 4;
    gzip_min_length 256;
    gzip_proxied expired no-cache no-store private no_last_modified no_etag auth;
    gzip_types application/atom+xml application/javascript application/json application/ld+json application/manifest+json application/rss+xml application/vnd.geo+json application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json application/xhtml+xml application/xml font/opentype image/bmp image/svg+xml image/x-icon text/cache-manifest text/css text/plain text/vcard text/vnd.rim.location.xloc text/vtt text/x-component text/x-cross-domain-policy;

    location / {
      rewrite ^ /index.php$request_uri;
    }

    location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)/ {
      deny all;
    }

    location ~ ^/(?:\.|autotest|occ|issue|indie|db_|console) {
      deny all;
    }

    location ~ ^\/(?:index|remote|public|cron|core\/ajax\/update|status|ocs\/v[12]|updater\/.+|oc[ms]-provider\/.+)\.php(?:$|\/) {
      fastcgi_split_path_info ^(.+\.php)(/.*)$;
      include fastcgi_params;
      fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
      fastcgi_param PATH_INFO $fastcgi_path_info;
      fastcgi_param HTTPS on;
      fastcgi_param modHeadersAvailable true;
      fastcgi_param front_controller_active true;
      fastcgi_pass php-handler;
      fastcgi_intercept_errors on;
      fastcgi_request_buffering off;
      fastcgi_keep_conn off;
      fastcgi_buffers 16 256K;
      fastcgi_buffer_size 256k;
      fastcgi_busy_buffers_size 256k;
      fastcgi_temp_file_write_size 256k;
      fastcgi_send_timeout 3000s;
      fastcgi_read_timeout 3000s;
      fastcgi_connect_timeout 3000s;
    }

    location ~ ^\/(?:updater|oc[ms]-provider)(?:$|\/) {
      try_files $uri/ =404;
      index index.php;
    }

    # ADDING THE CACHE CONTROL HEADER FOR JS AND CSS FILES
    # MAKE SURE IT IS BELOW PHP BLOCK
    location ~ \.(?:css|js|woff2?|svg|gif)$ {
      try_files $uri /index.php$uri$is_args$args;
      add_header Cache-Control "public, max-age=15778463";
      # HEADERS SECURITY RELATED
      # IT IS INTENDED TO HAVE THOSE DUPLICATED TO ONES ABOVE
      add_header Strict-Transport-Security "max-age=15768000; includeSubDomains; preload;";
      # HEADERS
      add_header X-Content-Type-Options nosniff;
      add_header X-XSS-Protection "1; mode=block";
      add_header X-Robots-Tag none;
      add_header X-Download-Options noopen;
      add_header X-Permitted-Cross-Domain-Policies none;
      # OPTIONAL: DONT LOG ACCESS TO ASSETS
      access_log off;
    }

    location ~ \.(?:png|html|ttf|ico|jpg|jpeg)$ {
      try_files $uri /index.php$uri$is_args$args;
      # OPTIONAL: DONT LOG ACCESS TO OTHER ASSETS
      access_log off;
    }
  }
}

OpenSSL HTTPS Certificates

We will generate a certificates needed for HTTPS service for Nextcloud.

# mkdir -p /usr/local/etc/nginx/ssl

# cd /usr/local/etc/nginx/ssl

# openssl genrsa -des3 -out server.key 2048
Generating RSA private key, 2048 bit long modulus (2 primes)
............+++++
....+++++
e is 65537 (0x010001)
Enter pass phrase for server.key: SERVER_KEY_PASSWORD
Verifying - Enter pass phrase for server.key: SERVER_KEY_PASSWORD

As usual use something more sensible then SERVER_KEY_PASSWORD string here ๐Ÿ™‚

# openssl req -new -key server.key -out server.csr
Enter pass phrase for server.key:
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:PL
State or Province Name (full name) [Some-State]:lodzkie
Locality Name (eg, city) []:Lodz
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Vermaden Enterprises Ltd.
Organizational Unit Name (eg, section) []:IT Department
Common Name (e.g. server FQDN or YOUR name) []:nextcloud.domain.com
Email Address []:vermaden@interia.pl

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:


# cp server.key server.key.orig

# openssl rsa -in server.key.orig -out server.key
Enter pass phrase for server.key.orig: SERVER_KEY_PASSWORD
writing RSA key

# ls -l /usr/local/etc/nginx/ssl
total 7
-rw-r--r--  1 root  wheel  1151 2019.12.31 12:39 server.csr
-rw-------  1 root  wheel  1679 2019.12.31 12:41 server.key
-rw-------  1 root  wheel  1751 2019.12.31 12:40 server.key.orig

# openssl x509 -req -days 7000 -in server.csr -signkey server.key -out server.crt
Signature ok
subject=C = PL, ST = lodzkie, L = Lodz, O = Vermaden Enterprises Ltd., OU = IT Department, CN = nextcloud.domain.com, emailAddress = vermaden@interia.pl
Getting Private key

# ln -s /usr/local/etc/nginx/ssl/server.crt /usr/local/etc/nginx/ssl/ssl-bundle.crt

PHP

Here is the used PHP configuration with up to 16GB files for Nextcloud.

# grep '^[^;]' /usr/local/etc/php.ini
[PHP]
max_input_time=3600
engine = On
short_open_tag = On
precision = 14
output_buffering = OFF
zlib.output_compression = Off
implicit_flush = Off
unserialize_callback_func =
serialize_precision = 17
disable_functions =
disable_classes =
zend.enable_gc = On
expose_php = On
max_execution_time = 3600
max_input_time = 30000
memory_limit = 1024M
error_reporting = E_ALL & ~E_DEPRECATED & ~E_STRICT
display_errors = Off
display_startup_errors = Off
log_errors = On
log_errors_max_len = 1024
ignore_repeated_errors = Off
ignore_repeated_source = Off
report_memleaks = On
track_errors = Off
html_errors = On
error_log = /var/log/php.log
variables_order = "GPCS"
request_order = "GP"
register_argc_argv = Off
auto_globals_jit = On
post_max_size = 16400M
auto_prepend_file =
auto_append_file =
default_mimetype = "text/html"
default_charset = "UTF-8"
doc_root =
user_dir =
enable_dl = Off
file_uploads = On
upload_max_filesize = 16400M
max_file_uploads = 64
allow_url_fopen = On
allow_url_include = Off
default_socket_timeout = 300
[CLI Server]
cli_server.color = On
[Date]
date.timezone = Europe/Warsaw
[filter]
[iconv]
[intl]
[sqlite3]
[Pcre]
[Pdo]
[Pdo_mysql]
pdo_mysql.cache_size = 2000
pdo_mysql.default_socket=
[Phar]
[mail function]
SMTP = localhost
smtp_port = 25
mail.add_x_header = On
[SQL]
sql.safe_mode = Off
[ODBC]
odbc.allow_persistent = On
odbc.check_persistent = On
odbc.max_persistent = -1
odbc.max_links = -1
odbc.defaultlrl = 4096
odbc.defaultbinmode = 1
[Interbase]
ibase.allow_persistent = 1
ibase.max_persistent = -1
ibase.max_links = -1
ibase.timestampformat = "%Y-%m-%d %H:%M:%S"
ibase.dateformat = "%Y-%m-%d"
ibase.timeformat = "%H:%M:%S"
[MySQLi]
mysqli.max_persistent = -1
mysqli.allow_persistent = On
mysqli.max_links = -1
mysqli.cache_size = 2000
mysqli.default_port = 3306
mysqli.default_socket =
mysqli.default_host =
mysqli.default_user =
mysqli.default_pw =
mysqli.reconnect = Off
[mysqlnd]
mysqlnd.collect_statistics = On
mysqlnd.collect_memory_statistics = Off
[OCI8]
[PostgreSQL]
pgsql.allow_persistent = On
pgsql.auto_reset_persistent = Off
pgsql.max_persistent = -1
pgsql.max_links = -1
pgsql.ignore_notice = 0
pgsql.log_notice = 0
[bcmath]
bcmath.scale = 0
[browscap]
[Session]
session.save_handler = files
session.save_path = "/tmp"
session.use_strict_mode = 0
session.use_cookies = 1
session.use_only_cookies = 1
session.name = PHPSESSID
session.auto_start = 0
session.cookie_lifetime = 0
session.cookie_path = /
session.cookie_domain =
session.cookie_httponly =
session.serialize_handler = php
session.gc_probability = 1
session.gc_divisor = 1000
session.gc_maxlifetime = 1440
session.referer_check =
session.cache_limiter = nocache
session.cache_expire = 180
session.use_trans_sid = 0
session.hash_function = 0
session.hash_bits_per_character = 5
url_rewriter.tags = "a=href,area=href,frame=src,input=src,form=fakeentry"
[Assertion]
zend.assertions = -1
[COM]
[mbstring]
[gd]
[exif]
[Tidy]
tidy.clean_output = Off
[soap]
soap.wsdl_cache_enabled=1
soap.wsdl_cache_dir="/tmp"
soap.wsdl_cache_ttl=86400
soap.wsdl_cache_limit = 5
[sysvshm]
[ldap]
ldap.max_links = -1
[mcrypt]
[dba]
[opcache]
opcache.enable=1
opcache.enable_cli=1
opcache.interned_strings_buffer=8
opcache.max_accelerated_files=10000
opcache.memory_consumption=128
opcache.save_comments=1
opcache.revalidate_freq=1
[curl]
[openssl] 

PHP PostgreSQL Database Settings

Below are needed to make PHP work with PostgreSQL database.

# cat /usr/local/etc/php/ext-20-pgsql.ini
extension=pgsql.so

# cat  /usr/local/etc/php/ext-20-pgsql.ini

[PostgresSQL]
pgsql.allow_persistent = On
pgsql.auto_reset_persistent = Off
pgsql.max_persistent = -1
pgsql.max_links = -1
pgsql.ignore_notice = 0
pgsql.log_notice = 0
EOF

# cat /usr/local/etc/php/ext-20-pgsql.ini
extension=pgsql.so

[PostgresSQL]
pgsql.allow_persistent = On
pgsql.auto_reset_persistent = Off
pgsql.max_persistent = -1
pgsql.max_links = -1
pgsql.ignore_notice = 0
pgsql.log_notice = 0


… and the second one.

# cat /usr/local/etc/php/ext-30-pdo_pgsql.ini
extension=pdo_pgsql.so

# cat  /usr/local/etc/php/ext-30-pdo_pgsql.ini

[PostgresSQL]
pgsql.allow_persistent = On
pgsql.auto_reset_persistent = Off
pgsql.max_persistent = -1
pgsql.max_links = -1
pgsql.ignore_notice = 0
pgsql.log_notice = 0
EOF

# cat /usr/local/etc/php/ext-30-pdo_pgsql.ini
extension=pdo_pgsql.so

[PostgresSQL]
pgsql.allow_persistent = On
pgsql.auto_reset_persistent = Off
pgsql.max_persistent = -1
pgsql.max_links = -1
pgsql.ignore_notice = 0
pgsql.log_notice = 0

PHP FPM

Now the PHP FPM daemon.

# grep '^[^;]' /usr/local/etc/php-fpm.conf
[global]
pid = run/php-fpm.pid
error_log = log/php-fpm.log
syslog.facility = daemon
include=/usr/local/etc/php-fpm.d/*.conf

# touch /var/log/php-fpm.log

# chown www:www /var/log/php-fpm.log

# grep '^[^;]' /usr/local/etc/php-fpm.d/www.conf
[www]
user = www
group = www
listen = 127.0.0.1:9000
listen.backlog = -1
listen.owner = www
listen.group = www
listen.mode = 0660
listen.allowed_clients = 127.0.0.1
pm = static
pm.max_children = 8
pm.start_servers = 4
pm.min_spare_servers = 4
pm.max_spare_servers = 32
pm.process_idle_timeout = 1000s;
pm.max_requests = 500
request_terminate_timeout = 0
rlimit_files = 51200
env[HOSTNAME] = $HOSTNAME
env[PATH] = /usr/local/bin:/usr/bin:/bin
env[TMP] = /tmp
env[TMPDIR] = /tmp
env[TEMP] = /tmp

Start Backend Services

We will now start all ‘backend’ services needed for Nextcloud.

# service postgresql start
2020-01-02 13:18:05.970 CET [52233] LOG:  starting PostgreSQL 12.1 on amd64-portbld-freebsd12.0, compiled by FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1), 64-bit
2020-01-02 13:18:05.974 CET [52233] LOG:  listening on IPv6 address "::1", port 5432
2020-01-02 13:18:05.974 CET [52233] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2020-01-02 13:18:05.975 CET [52233] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2020-01-02 13:18:06.024 CET [52233] LOG:  ending log output to stderr
2020-01-02 13:18:06.024 CET [52233] HINT:  Future log output will go to log destination "syslog".

# service postgresql status
pg_ctl: server is running (PID: 36089)
/usr/local/bin/postgres "-D" "/var/db/postgres/data12"

# service php-fpm start
Performing sanity check on php-fpm configuration:
[02-Jan-2020 13:16:50] NOTICE: configuration file /usr/local/etc/php-fpm.conf test is successful

Starting php_fpm.

# service php-fpm status
php_fpm is running as pid 52193.

# service memcached start
Starting memcached.

# service memcached status
memcached is running as pid 52273.

# service nginx start
Performing sanity check on nginx configuration:
nginx: the configuration file /usr/local/etc/nginx/nginx.conf syntax is ok
nginx: configuration file /usr/local/etc/nginx/nginx.conf test is successful
Starting nginx.

Nextcloud Configuration

I created a link named /data to the Nextcloud data directory located at /usr/local/www/nextcloud/data place.

# ln -s /usr/local/www/nextcloud/data /data

The we use Firefox or other web browser to finish the Nextcloud configuration.

Type https://1.2.3.4 in the browser where 1.2.3.4 is your Nextcloud instance IP address.

I am sorry but the following image is in the Polish language – I forgot to change it to English … but I assume you will what to put in these fields by context.

nextcloud-setup.png

After we finish the setup we go straight to Nextcloud Overview page at https://1.2.3.4/settings/admin/serverinfoto page to see what else needs to be taken care of.

nextcloud-setup-overview.png

Two issues needs to be addressed. One is about Nginx configuration, the other is about PostgreSQL, let’s fix them.

We will add needed header to the Nginx configuration file.

# diff -u /usr/local/etc/nginx/nginx.conf.OLD /usr/local/etc/nginx/nginx.conf
--- /usr/local/etc/nginx/nginx.conf.OLD  2020-01-02 14:21:58.359398000 +0100
+++ /usr/local/etc/nginx/nginx.conf      2020-01-02 14:21:42.823426000 +0100
@@ -46,6 +46,7 @@
     add_header X-Robots-Tag none;
     add_header X-Download-Options noopen;
     add_header X-Permitted-Cross-Domain-Policies none;
+    add_header X-Frame-Options "SAMEORIGIN";

     # PATH TO THE ROOT OF YOUR INSTALLATION
     root /usr/local/www/nextcloud/;

# service nginx reload
Performing sanity check on nginx configuration:
nginx: the configuration file /usr/local/etc/nginx/nginx.conf syntax is ok
nginx: configuration file /usr/local/etc/nginx/nginx.conf test is successful

… and update the PostgreSQL convertion.

# sudo -u www /usr/local/bin/php /usr/local/www/nextcloud/occ db:convert-filecache-bigint
Following columns will be updated:

* mounts.storage_id
* mounts.root_id
* mounts.mount_id

This can take up to hours, depending on the number of files in your instance!
Continue with the conversion (y/n)? [n] y

Viola! Both of our problems are gone now.

nextcloud-setup-overview-fixed.png

Trusted Domains

When you will enter the Nextcloud using different domain you will get a warning about that.

To add new Trusted Domain to the Nextcloud config do the following.

Here is how it looks before changes.

# grep -A 3 trusted /usr/local/www/nextcloud/config/config.php
  'trusted_domains' =>
  array (
    0 => '1.2.3.4',
  ),

We will now add nextcloud.domain.com domain.

# vi /usr/local/www/nextcloud/config/config.php

# grep -A 4 trusted /usr/local/www/nextcloud/config/config.php
  'trusted_domains' =>
  array (
    0 => '1.2.3.4',
    1 => 'nextcloud.domain.com',
  ),

You can of course add more with successive numbers.

# grep -A 5 trusted /usr/local/www/nextcloud/config/config.php
  'trusted_domains' =>
  array (
    0 => '1.2.3.4',
    1 => 'nextcloud.domain.com',
    2 => 'cloud.domain.com',
  ),

This is the end of this guide. Feel free to share your thougths ๐Ÿ™‚

Log Rotation with Newsyslog

Newsyslog is part of FreeBSD’s base system. We will add Nextcloud and backend daemons log files to Newsyslog configuration so they will be rotated.

 
# cat  /etc/newsyslog.conf
/data/nextcloud.log                          www:www     640  7     *    @T00  JC
/usr/local/www/nextcloud/data/nextcloud.log  www:www     640  7     *    @T00  JC
/var/log/php-fpm.log                         www:www     640  7     *    @T00  JC
/var/log/nginx/error.log                     www:www     640  7     *    @T00  JC
/var/log/nginx/access.log                    www:www     640  7     *    @T00  JC
EOF

Now you will not run out of free space when logs will grow in time.

EOF

ย 

FreeBSD Desktop – Part 19 – Configuration – Plank – Skippy-XD

Long time no see :). In this article of the FreeBSD Desktop series we will add Plank and Skippy-XD to the existing setup.

I will share with You Plank configuration along with theme that fits to the rest of the setup. Plank is an open implementation of the ideas that was brought to life by Mac OS X (macOS) Dock. We will also add Skippy-XD tool that implements Mac OS X (macOS) Expose ideas.

One my ask why use Plank while we already have Tint2 for similar purposes? While both support autohide I prefer to see Tint2 all the time to get basic/fast idea about what is launched on which desktop and have Plank hidden as it does not hurt and sometimes helps.

Here is both Mac OS X (macOS) Dock and Expose in action.

macosx-dock-expose.jpg

You may want to check other articles in the FreeBSD Desktop series on the FreeBSD Desktop – Global Page where you will find links to all episodes of the series along with table of contents for each episode’s contents.

To install both Plank and Skippy-XD on FreeBSD just use the default packages as shown below.

# pkg install skippy-xd plank

Plank

Here is how Plank dock composes with the rest of the setup.

shot-res-small

The Plank dock comes with graphical preferences window if needed but you need to launch it from the command line as plank --preferences command.

plank-prefs.jpg

Here is the used Plank theme which is kept in the ~/.local/share/plank/themes/vermaden/dock.theme file.


% grep '^[^#]' ~/.local/share/plank/themes/vermaden/dock.theme

[PlankTheme]
TopRoundness=0
BottomRoundness=0
LineWidth=0
OuterStrokeColor=0;;0;;0;;255
FillStartColor=40;;40;;40;;255
FillEndColor=40;;40;;40;;255
InnerStrokeColor=40;;40;;40;;255

[PlankDockTheme]
HorizPadding=0
TopPadding=1
BottomPadding=2
ItemPadding=2.5
IndicatorSize=10
IconShadowSize=0
UrgentBounceHeight=0
LaunchBounceHeight=0
FadeOpacity=0
ClickTime=300
UrgentBounceTime=600
LaunchBounceTime=600
ActiveTime=300
SlideTime=300
FadeTime=250
HideTime=150
GlowSize=24
GlowTime=10000
GlowPulseTime=2000
UrgentHueShift=150
ItemMoveTime=150
CascadeHide=true

[PlankDrawingDockTheme]
HorizPadding=0
ItemPadding=2.5
CascadeHide=true

And here are mine Plank dock settings which are kept in the ~/.config/plank/dock1/settings file.


% grep '^[^#]' ~/.config/plank/dock1/settings

[PlankDockPreferences]
CurrentWorkspaceOnly=false
IconSize=32
HideMode=0
UnhideDelay=0
HideDelay=0
Monitor=DP-1
DockItems=caja.dockitem;;leafpad.dockitem;;firefox.dockitem;;geany.dockitem;;thunderbird.dockitem;;galculator.dockitem;;deadbeef.dockitem;;transmission-gtk.dockitem;;pidgin.dockitem
Position=3
Offset=20
Theme=vermaden
Alignment=3
ItemsAlignment=3
LockItems=false
PressureReveal=false
PinnedOnly=false
AutoPinning=true
ShowDockItem=true
ZoomEnabled=false
ZoomPercent=150


Skippy-XD

You may wonder why the XD in the Skippy name. Its because Skippy started as a pure software solution – which unfortunately was quite slow – especially in the times when Skippy was introduced, and it was about a decade ago. Then Skippy developers rewrote it to use the – new then XDAMAGE module for X11 – from this change Skippy started to work almost instantly – and this was marked in its name and it remains to this date as Skippy-XD.

This is how Skippy-XD looks like.

skippy-xd.jpg

The Skippy-XD does not need/support themes – it just has a configuration file located at ~/.config/skippy-xd/skippy-xd.rc place.


% grep '^[^#]' ~/.config/skippy-xd/skippy-xd.rc

[general]
distance = 50
useNetWMFullscreen = true
ignoreSkipTaskbar = true
updateFreq = 30.0
lazyTrans = true
pipePath = /tmp/skippy-xd-fifo
movePointerOnStart = true
movePointerOnSelect = true
movePointerOnRaise = true
switchDesktopOnActivate = true
useNameWindowPixmap = false
forceNameWindowPixmap = false
includeFrame = true
allowUpscale = true
showAllDesktops = true
showUnmapped = true
preferredIconSize = 32
clientDisplayModes = thumbnail icon filled none
iconFillSpec = orig mid mid #666666
fillSpec = orig mid mid #FFFFFF
background =

[xinerama]
showAll = true

[normal]
tint = black
tintOpacity = 0
opacity = 200

[highlight]
tint = #202020
tintOpacity = 64
opacity = 255

[tooltip]
show = true
followsMouse = true
offsetX = 20
offsetY = 20
align = left
border = #111111
background = #333333
opacity = 128
text = #eedddd
textShadow = none
font = ubuntu-10:weight=normal

[bindings]
miwMouse1 = focus
miwMouse2 = close-ewmh
miwMouse3 = iconify

One of the nice features of Skippy-XD is that you can configure it per desktop or globally per all currently existing virtual desktops. I also prefer to display window thumbnails only from the windows that exist on the current desktop. You can of course change that behavior with the Skippy-XD config file.

EOF

List Block Devices on FreeBSD lsblk(8) Style

When I have to work on Linux systems I usually miss many nice FreeBSD tools such as these for example to name the few:

  • sockstat
  • gstat
  • top -b -o res
  • top -m io -o total
  • usbconfig
  • rcorder
  • beadm/bectl
  • idprio/rtprio

… but sometimes – which rarely happens – Linux has some very useful tool that is not available on FreeBSD. An example of such tool is lsblk(8) that does one thing and does it quite well – lists block devices and their contents. It has some problems like listing a disk that is entirely used under ZFS pool on which lsblk(8) displays two partitions instead of information about ZFS just being there – but we all know how much in some circles the CDDL licensed ZFS is unloved in that GPL world.

Example lsblk(8) output from Linux system:

$ lsblk
NAME                         MAJ:MIN RM   SIZE RO TYPE   MOUNTPOINT
sr0                           11:0    1  1024M  0 rom
sda                            8:0    0 931.5G  0 disk
|-sda1                         8:1    0   500M  0 part   /boot
`-sda2                         8:2    0   931G  0 part
  |-vg_local-lv_root (dm-0)  253:0    0    50G  0 lvm    /
  |-vg_local-lv_swap (dm-1)  253:1    0  17.7G  0 lvm    [SWAP]
  `-vg_local-lv_home (dm-2)  253:2    0   1.8T  0 lvm    /home
sdc                            8:32   0 232.9G  0 disk
`-sdc1                         8:33   0 232.9G  0 part
  `-md1                        9:1    0 232.9G  0 raid10 /data
sdd                            8:48   0 232.9G  0 disk
`-sdd1                         8:49   0 232.9G  0 part
  `-md1                        9:1    0 232.9G  0 raid10 /data

What FreeBSD offers in this department? The camcontrol(8) and geom(8) commands are available. You can also use gpart(8) command to list partitions. Below you will find output of these commands from my single disk laptop. Please note that because of WordPress limitations I need to change all > < characters to ] [ ones in the commands outputs.

# camcontrol devlist
[Samsung SSD 860 EVO mSATA 1TB RVT41B6Q]  at scbus1 target 0 lun 0 (ada0,pass0)

% geom disk list
Geom name: ada0
Providers:
1. Name: ada0
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Mode: r1w1e2
   descr: Samsung SSD 860 EVO mSATA 1TB
   lunid: 5002538e402b4ddd
   ident: S41PNB0K303632D
   rotationrate: 0
   fwsectors: 63
   fwheads: 1

# gpart show
=>        40  1953525088  ada0  GPT  (932G)
          40      409600     1  efi  (200M)
      409640        1024     2  freebsd-boot  (512K)
      410664         984        - free -  (492K)
      411648  1953112064     3  freebsd-zfs  (931G)
  1953523712        1416        - free -  (708K)

They provide needed information in acceptable manner but only on systems with small amount of disks. What if you would like to display a summary of all system drives contents? This is where lsblk.sh comes handy. While lsblk(8) has many interesting features like --perms/--scsi/--inverse modes I focused to provide only the basic feature – to list the system block devices and their contents. As I have long and pleasing experience with writing shell scripts such as sysutils/beadm or sysutils/automount I though that writing lsblk.sh may be a good idea. I actually ‘open-sourced’ or should I say shared that project/idea in 2016 in this thread lsblk(8) Command for FreeBSD on FreeBSD Forums but lack of time really slowed that ‘side project’ development pace. I finally got back to it to finish it.

The lsblk.sh is generally small and simple shell script which tales less then 400 SLOC.

lsblk

Here is example output of lsblk.sh command from my single disk laptop.

% lsblk.sh
DEVICE         MAJ:MIN  SIZE TYPE                      LABEL MOUNT
ada0             0:5b  932G GPT                           - -
  ada0p1         0:64  200M efi                    efiboot0 [UNMOUNTED]
  ada0p2         0:65  512K freebsd-boot           gptboot0 -
  [FREE]         -:-   492K -                             - -
  ada0p3         0:66  931G freebsd-zfs                zfs0 [ZFS]
  [FREE]         -:-   708K -                             - -


Same output in graphical window.

lolcat

Below you will find an example lsblk.sh output from server with two system SSD drives (da0/da1) and two HDD data drives (da2/da3).

# lsblk.sh
DEVICE         MAJ:MIN SIZE TYPE                      LABEL MOUNT
da0              0:be  224G GPT                           - -
  da0p1          0:15a 200M efi                    efiboot0 [UNMOUNTED]
  da0p2          0:15b 512K freebsd-boot           gptboot0 -
  [FREE]         -:-   492K -                             - -
  da0p3          0:15c 2.0G freebsd-swap              swap0 [UNMOUNTED]
  da0p4          0:15d 221G freebsd-zfs                zfs0 [ZFS]
  [FREE]         -:-   580K -                             - -
da1              0:bf  224G GPT                           - -
  da1p1          0:16a 200M efi                    efiboot1 [UNMOUNTED]
  da1p2          0:16b 512K freebsd-boot           gptboot1 -
  [FREE]         -:-   492K -                             - -
  da1p3          0:16c 2.0G freebsd-swap              swap1 [UNMOUNTED]
  da1p4          0:16d 221G freebsd-zfs                zfs1 [ZFS]
  [FREE]         -:-   580K -                             - -
da2              0:c0   11T GPT                           - -
  da2p1          0:16e  11T freebsd-zfs                   - [ZFS]
  [FREE]         -:-   1.0G -                             - -
da3              0:c1   11T GPT                           - -
  da3p1          0:16f  11T freebsd-zfs                   - [ZFS]
  [FREE]         -:-   1.0G -                             - -

Below you will find other examples from other systems I have tested lsblk.sh on.

lsblk.examples

While lsblk.sh is not the fastest script on Earth (because of all the needed parsing) it does its job quite well. If you would like to install it in your system just type the command below:

# fetch -o /usr/local/bin/lsblk https://raw.githubusercontent.com/vermaden/scripts/master/lsblk.sh
# chmod +x /usr/local/bin/lsblk
# hash -r || rehash
# lsblk

If I got time which other original Linux lsblk(8) subcommand/option/argument is worth adding to the lsblk.sh script? ๐Ÿ™‚

Regards.

UPDATE 1 – Added USAGE/HELP Information

Just added some usage information that can be displayed by specifying one of these as argument:

  • h
  • -h
  • --h
  • help
  • -help
  • --help

IMHO writing man page for such simple utility is needless. I think I will create dedicated man page when lsblk.sh tool will grow in size and options to comparable with the Linux lsblk(8) equivalent. Here is how it looks.

# lsblk.sh --help
usage:

  BASIC USAGE INFORMATION
  =======================
  # lsblk.sh [DISK]

example(s):

  LIST ALL BLOCK DEVICES IN SYSTEM
  --------------------------------
  # lsblk.sh
  DEVICE         MAJ:MIN SIZE TYPE                      LABEL MOUNT
  ada0             0:5b  932G GPT                           - -
    ada0p1         0:64  200M efi                    efiboot0 [UNMOUNTED]
    ada0p2         0:65  512K freebsd-boot           gptboot0 -
    [FREE]         -:-   492K -                             - -
    ada0p3         0:66  931G freebsd-zfs                zfs0 [ZFS]

  LIST ONLY da1 BLOCK DEVICE
  --------------------------
  # lsblk.sh da1
  DEVICE         MAJ:MIN SIZE TYPE                      LABEL MOUNT
  da1              0:80  2.0G MBR                           - -
    da1s1          0:80  2.0G freebsd                       - -
      da1s1a       0:81  1.0G freebsd-ufs                root /
      da1s1b       0:82  1.0G freebsd-swap               swap SWAP

hint(s):

  DISPLAY ALL DISKS IN SYSTEM
  ---------------------------
  # sysctl kern.disks
  kern.disks: ada0 da0 da1

Regards.

UPDATE 2 โ€“ Code Reorganization and 75% Rewrite

… at least this is what git(1) tries to tell me after commit message.

% git commit (...)
[master 12fd4aa] Rework entire flow. Split code into functions. Add many useful comments. In other words its 2.0 version.
 1 file changed, 494 insertions(+), 505 deletions(-)
 rewrite lsblk.sh (75%)

After several productive hours new incarnation of lsblk.sh is now available.

It has similar SLOC but its now smaller by a quarter … while doing more and with better accuracy. Great example why “less is more.”

% wc scripts/lsblk.sh.OLD
     491    2201   19721 scripts/lsblk.sh.OLD

% wc scripts/lsblk.sh
     494    1871   15472 scripts/lsblk.sh

Things that does not have simple solution are described below.

One of them is ‘double’ label for FAT filesystems. We have both /dev/gpt/efiboot0 label and FAT label is named EFISYS. We have to choose something here. As not all FAT filesystems have label I have chosen the GPT label.

% glabel status | grep ada0p1
  gpt/efiboot0     N/A  ada0p1
msdosfs/EFISYS     N/A  ada0p1

I was also not able to cover FUSE mounts. When you mount – for example – the /dev/da0 device as NTFS (with ntfs-3g) or exFAT (with mount.exfat) there is no visible difference in mount(8) output.

% mount -t fusefs
/dev/fuse on /mnt/ntfs (fusefs)
/dev/fuse on /mnt/exfat (fusefs)

When I mount such filesystem by my daemon (like sysutils/automount) I keep track of what device have been mounted to which directory in the /var/run/automount.state file. Then when I get the detach event for /dev/da0 device I know what to u(n)mount … but when I only have /dev/fuse device its just not possible.

… or maybe YOU know any way of extracting information from /dev/fuse (or generally from FUSE) what device is mounted where?

Now little presentation after update.

Here are various non ZFS filesystems mounted.

% mount -t nozfs
devfs on /dev (devfs, local, multilabel)
linprocfs on /compat/linux/proc (linprocfs, local)
tmpfs on /compat/linux/dev/shm (tmpfs, local)
/dev/label/ASD on /mnt/tmp (msdosfs, local)
/dev/fuse on /mnt/ntfs (fusefs)
/dev/md0s1f on /mnt/ufs.other (ufs, local)
/dev/gpt/OTHER on /mnt/fat.other (msdosfs, local)
/dev/md0s1a on /mnt/ufs (ufs, local)

… and here is how now lsblk.sh displays them.

% lsblk.sh
DEVICE         MAJ:MIN SIZE TYPE                      LABEL MOUNT
ada0             0:56  932G GPT                           - -
  ada0p1         0:64  200M efi                gpt/efiboot0 -
  ada0p2         0:65  512K freebsd-boot       gpt/gptboot0 -
  [FREE]         -:-   492K -                             - -
  ada0p3         0:66  931G freebsd-zfs                   - [ZFS]
  [FREE]         -:-   708K -                             - -
md0              0:28f 1.0G MBR                           - -
  md0s1          0:294 512M freebsd                       - -
    md0s1a       0:29a 100M freebsd-ufs                root /mnt/ufs
    md0s1b       0:29b  32M freebsd-swap         label/swap SWAP
    md0s1e       0:29c  64M freebsd-ufs                   - -
    md0s1f       0:29d 316M freebsd-ufs                   - /mnt/ufs.other
  md0s2          0:296 256M ntfs                          - -
  md0s3          0:297 256M fat32               msdosfs/ONE -
md1              0:2a4 1.0G msdosfs                   LARGE 
md2              0:298 2.0G GPT                           - -
  md2p1          0:29f 2.0G ms-basic-data         gpt/OTHER /mnt/fat.other

I used some file based memory devices for this. Now by default lsblk.sh also displays memory disks contents.

% mdconfig.sh -l
md0     vnode    1024M  /home/vermaden/FILE     
md2     vnode    2048M  /home/vermaden/FILE.GPT 
md1     vnode    1024M  /home/vermaden/FILER    

Here is how it looks in the xterm(1) terminal.

lsblk.2.0

Regards.

UPDATE 3 – Added geli(8) Support

I thought that adding geli(8) support may be useful. The latest lsblk.sh now avoids code duplication for MOUNT and LABEL detection (moved into single unified function). Also added more comments for code readability and some minor fixes … and its again smaller ๐Ÿ™‚

% wc lsblk.sh.1.0
     491    2201   19721 lsblk.sh.1.0

% wc lsblk.sh.2.0
     493    1861   15415 lsblk.sh.2.0

% wc lsblk.sh
     488    1820   15332 lsblk.sh

About 40% (according to git commit was changed this time (191 insertions and 196 deletions).

# git commit (...)
[master ec9985a] Add geli(8) support. Avoid code duplication and move MOUNT/LABEL detection into function. More comments. Minor fixes.
 1 file changed, 191 insertions(+), 196 deletions(-)

Also forgot to mention that now lsblk.sh thanks to smart optimizations (like not doing things twice and aggregating grep(1) | awk(1) pipes into single awk(1) queries) runs 3 times faster then the initial version ๐Ÿ™‚

New output with geli(8) support below.

lsblk.2.1.geli.png

Regards.

UPDATE 4 – Added fuse(8) Support

As I wrote in the UPDATE 2 keeping track of what is mounted and where under fuse(8) is very hard as all mounted devices magically become /dev/fuse after mount is done.

After little research I found that this information (what really is mounted where by using fuse(8) interface under FreeBSD) is available after mounting procfs filesystem under /proc. You just need to cat cmdline entry for all PIDs of ntfs-3g. Its not perfect but the information at least is available.

# mount -t procfs proc /proc

# ps ax | grep ntfs-3g
45995  -  Is      0:00.00 ntfs-3g /dev/md1s2 /mnt/ntfs
59607  -  Is      0:00.00 ntfs-3g /dev/md3 /mnt/ntfs.another
83323  -  Is      0:00.00 ntfs-3g /dev/md3 /mnt/ntfs.another

# pgrep ntfs-3g
59607
83323
45995

% pgrep ntfs-3g | while read I; do cat /proc/$I/cmdline; echo; done
ntfs-3g/dev/md3/mnt/ntfs.another
ntfs-3g/dev/md3/mnt/ntfs.another
ntfs-3g/dev/md1s2/mnt/ntfs

This was the code prototype that worked for fuse(8) mountpoints detection.

    if [ -e /proc/0/status ]
    then
      FUSE_MOUNTS=$(
        while read PID
        do
          cat /proc/${PID}/cmdline
          echo
        done << ________EOF
          $( pgrep ntfs-3g )
________EOF
)
      FUSE_MOUNTS=$( echo "${FUSE_MOUNTS}" | sort -u )
      FUSE_MOUNTS=$( echo "${FUSE_MOUNTS}" | sed 's|ntfs-3g||g' )
      FUSE_CHECKS=$( echo "${FUSE_MOUNTS}" | grep /dev/${TARGET}/ )
      if [ "${FUSE_CHECKS}" != "" ]
      then
        MOUNT=$( echo "${FUSE_CHECKS}" | sed "s|/dev/${TARGET}||g" )
      fi
    fi
  fi

… and I have just realized that I found new (better) way of getting that information without mounting /proc filesystem – all you need to do is to display the ntfs-3g processes with their command line arguments, for example like that:

% ps -p $( pgrep ntfs-3g | tr '\n' ',' | sed '$s/.$//' ) -o command | sed 1d
ntfs-3g /dev/md1s2 /mnt/ntfs
ntfs-3g /dev/md3 /mnt/ntfs.another
ntfs-3g /dev/md3 /mnt/ntfs.another

So after I also thought that its only for NTFS (ntfs-3g(8) process) I also added exFAT support by searching for mount.exfat PIDs as well. The fuse(8) mount point detection works now for both NTFS and exFAT filesystems … and code to support it is even shorter.

  # TRY fuse(8) MOUNTS FROM PROCESSES
  if [ "${MOUNT_FOUND}" != "1" ]
  then
    FUSE_PIDS=$( pgrep mount.exfat ntfs-3g | tr '\n' ',' | sed '$s/.$//' )
    FUSE_MOUNTS=$( ps -p "${FUSE_PIDS}" -o command | sed 1d | sort -u )
    MOUNT=$( echo "${FUSE_MOUNTS}" |  grep "/dev/${TARGET} " | awk '{print $3}' )
  fi

I also changed how MAJOR and MINOR numbers are displayed – from HEX to DEC – as it is on Linux. The FreeBSD’s ls(1) from Base System displays these as HEX – for example you will get 0x2af value:

% ls -l /dev/md4
crw-rw----  1 root  operator  0x2af 2019.09.29 05:18 /dev/md4

But do the same with GNU equivalent by using gls(1) from FreeBSD Ports (from sysutils/coreutils package) and it shows MAJOR and MINOR in DEC values. The gls(1) is just ls(1) from the Linux world but as ls(1) name is already ‘taken’ by FreeBSD’s Base System tool the FreeBSD developers/maintainers add ‘g’ letter (for GNU) to distinguish them.

% gls -l /dev/md4
crw-rw---- 1 root 2, 175 2019-09-29 05:18 /dev/md4

… and they are also easier/faster to get with stat(1) tool.

  MAJ=$( stat -f "%Hr" /dev/${DEV} )
  MIN=$( stat -f "%Lr" /dev/${DEV} )

Latest lsblk.sh looks like that now.

lsblk.2.3.fuse.NTFS.exFAT

… that is why I did not (yet) added lsblk.sh to the FreeBSD Ports. Several new versions with important features span across just two days ๐Ÿ™‚

Regards.

UPDATE 5 – Another 69% Rewrite

After messing with gpart(8) more I found that using its -p flag which is a game changer. The difference is that with -p flag it displays names along partitions – its no longer needed to find the PREFIX and ‘create’ partition names.

Default gpart(8) output.

# gpart show md0
=>     63  2097089  md0  MBR  (1.0G)
       63  1048576    1  freebsd  (512M)
  1048639   524288    2  ntfs  (256M)
  1572927   524225    3  fat32  (256M)

Output of gpart(8) with -p flag.

# gpart show -p md0
=>     63  2097089    md0  MBR  (1.0G)
       63  1048576  md0s1  freebsd  (512M)
  1048639   524288  md0s2  ntfs  (256M)
  1572927   524225  md0s3  fat32  (256M)

That discovery implicated a quite large rewrite of lsblk.sh. The git commit estimates this as 69% code rewrite.

# git commit (...)
(...)
 1 file changed, 487 insertions(+), 501 deletions(-)
 rewrite lsblk.sh (69%)

The latest lsblk.sh has now these features:

  • Previous bugs fixed.
  • Detects exFAT labels.
  • Is now 20% faster.
  • Has less 10% SLOC.
  • Has less 15% of code.
  • Handles bsdlabel(8) on entire device properly.
  • Handles exFAT on entire device properly.

The difference in code is shown below.

# wc lsblk.sh
     487    1791   13705 lsblk.sh

# wc lsblk.sh.OLD
     544    1931   16170 lsblk.sh.OLD

Latest lsblk.sh looks as usual but I now use ‘-‘ instead of ‘[UNMOUNTED]‘ one.

lsblk.2.5.gpart.exfat

EOF

FreeBSD Enterprise 1 PB Storage

Today FreeBSD operating system turns 26 years old. 19 June is an International FreeBSD Day. This is why I got something special today :). How about using FreeBSD as an Enterprise Storage solution on real hardware? This where FreeBSD shines with all its storage features ZFS included.

Today I will show you how I have built so called Enterprise Storage based on FreeBSD system along with more then 1 PB (Petabyte) of raw capacity.

I have build various storage related systems based on FreeBSD:

This project is different. How much storage space can you squeeze from a single 4U system? It turns out a lot! Definitely more then 1 PB (1024 TB) of raw storage space.

Here is the (non clickable) Table of Contents.

  • Hardware
  • Management Interface
  • BIOS/UEFI
  • FreeBSD System
    • Disks Preparation
    • ZFS Pool Configuration
    • ZFS Settings
    • Network Configuration
    • FreeBSD Configuration
  • Purpose
  • Performance
    • Network Performance
    • Disk Subsystem Performance
  • FreeNAS
  • UPDATE 1 – BSD Now 305
  • UPDATE 2 โ€“ Real Life Pictures in Data Center

Hardware

There are 4U servers with 90-100 3.5″ drive slots which will allow you to pack 1260-1400 Terabytes of data (with 14 TB drives). Examples of such systems are:

I would use the first one – the TYAN FA100 for short name.

logo-tyan.png

While both GlusterFS and Minio clusters were cone on virtual hardware (or even FreeBSD Jails containers) this one uses real physical hardware.

The build has following specifications.

 2 x 10-Core Intel Xeon Silver 4114 CPU @ 2.20GHz
 4 x 32 GB RAM DDR4 (128 GB Total)
 2 x Intel SSD DC S3500 240 GB (System)
90 x Toshiba HDD MN07ACA12TE 12 TB (Data)
 2 x Broadcom SAS3008 Controller
 2 x Intel X710 DA-2 10GE Card
 2 x Power Supply

Price of the whole system is about $65 000 – drives included. Here is how it looks.

tyan-fa100-small.jpg

One thing that you will need is a rack cabinet that is 1200 mm long to fit that monster ๐Ÿ™‚

Management Interface

The so called Lights Out management interface is really nice. Its not bloated, well organized and works quite fast. you can create several separate user accounts or can connect to external user services like LDAP/AD/RADIUS for example.

n01.png

After logging in a simple Dashboard welcomes us.

n02.png

We have access to various Sensor information available with temperatures of system components.

n03

We have System Inventory information with installed hardware.

n04.png

There is separate Settings menu for various setup options.

n05.png

I know its 2019 but HTML5 only Remote Control (remote console) without need for any third party plugins like Java/Silverlight/Flash/… is very welcomed. It works very well too.

n06.png

n07.png

One is of course allowed to power on/off/cycle the box remotely.

n08.png

The Maintenance menu for BIOS updates.

n09.png

BIOS/UEFI

After booting into the BIOS/UEFI setup its possible to select from which drives to boot from. On the screenshots the two SSD drives prepared for system.

nas01.png

The BIOS/UEFI interface shows two Enclosures but its two Broadcom SAS3008 controllers. Some drive are attached via first Broadcom SAS3008 controller, the rest is attached via the second one, and they call them Enclosures instead od of controllers for some reason.

nas05.png

FreeBSD System

I have chosen latest FreeBSD 12.0-RELEASE for the purpose of this installation. Its generally very ‘default’ installation with ZFS mirror on two SSD disks. Nothing special.

logo-freebsd.jpg

The installation of course supports the ZFS Boot Environments bulletproof upgrades/changes feature.

# zpool list zroot
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zroot   220G  3.75G   216G        -         -     0%     1%  1.00x  ONLINE  -

# zpool status zroot
  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da91p4  ONLINE       0     0     0
            da11p4  ONLINE       0     0     0

errors: No known data errors

# df -g
Filesystem              1G-blocks Used  Avail Capacity  Mounted on
zroot/ROOT/default            211    2    209     1%    /
devfs                           0    0      0   100%    /dev
zroot/tmp                     209    0    209     0%    /tmp
zroot/usr/home                209    0    209     0%    /usr/home
zroot/usr/ports               210    0    209     0%    /usr/ports
zroot/usr/src                 210    0    209     0%    /usr/src
zroot/var/audit               209    0    209     0%    /var/audit
zroot/var/crash               209    0    209     0%    /var/crash
zroot/var/log                 209    0    209     0%    /var/log
zroot/var/mail                209    0    209     0%    /var/mail
zroot/var/tmp                 209    0    209     0%    /var/tmp

# beadm list
BE      Active Mountpoint  Space Created
default NR     /            2.4G 2019-05-24 13:24

Disks Preparation

From all the possible setups with 90 disks of 12 TB capacity I have chosen to go the RAID60 way – its ZFS equivalent of course. With 12 disks in each RAID6 (raidz2) group – there will be 7 such groups – we will have 84 used for the ZFS pool with 6 drives left as SPARE disks – that plays well for me. The disks distribution will look more or less like that.

DISKS  CONTENT
   12  raidz2-0
   12  raidz2-1
   12  raidz2-2
   12  raidz2-3
   12  raidz2-4
   12  raidz2-5
   12  raidz2-6
    6  spares
   90  TOTAL

Here is how FreeBSD system sees these drives by camcontrol(8) command. Sorted by attached SAS controller – scbus(4).

# camcontrol devlist | sort -k 6
(AHCI SGPIO Enclosure 1.00 0001)   at scbus2 target 0 lun 0 (pass0,ses0)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 50 lun 0 (pass1,da0)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 52 lun 0 (pass2,da1)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 54 lun 0 (pass3,da2)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 56 lun 0 (pass5,da4)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 57 lun 0 (pass6,da5)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 59 lun 0 (pass7,da6)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 60 lun 0 (pass8,da7)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 66 lun 0 (pass9,da8)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 67 lun 0 (pass10,da9)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 74 lun 0 (pass11,da10)
(ATA INTEL SSDSC2KB24 0100)        at scbus3 target 75 lun 0 (pass12,da11)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 76 lun 0 (pass13,da12)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 82 lun 0 (pass14,da13)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 83 lun 0 (pass15,da14)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 85 lun 0 (pass16,da15)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 87 lun 0 (pass17,da16)
(Tyan B7118 0500)                  at scbus3 target 88 lun 0 (pass18,ses1)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 89 lun 0 (pass19,da17)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 90 lun 0 (pass20,da18)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 91 lun 0 (pass21,da19)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 92 lun 0 (pass22,da20)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 93 lun 0 (pass23,da21)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 94 lun 0 (pass24,da22)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 95 lun 0 (pass25,da23)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 96 lun 0 (pass26,da24)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 97 lun 0 (pass27,da25)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 98 lun 0 (pass28,da26)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 99 lun 0 (pass29,da27)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 100 lun 0 (pass30,da28)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 101 lun 0 (pass31,da29)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 102 lun 0 (pass32,da30)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 103 lun 0 (pass33,da31)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 104 lun 0 (pass34,da32)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 105 lun 0 (pass35,da33)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 106 lun 0 (pass36,da34)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 107 lun 0 (pass37,da35)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 108 lun 0 (pass38,da36)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 109 lun 0 (pass39,da37)
(ATA TOSHIBA MG07ACA1 0101)        at scbus3 target 110 lun 0 (pass40,da38)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 48 lun 0 (pass41,da39)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 49 lun 0 (pass42,da40)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 51 lun 0 (pass43,da41)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 53 lun 0 (pass44,da42)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 55 lun 0 (da43,pass45)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 59 lun 0 (pass46,da44)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 64 lun 0 (pass47,da45)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 67 lun 0 (pass48,da46)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 68 lun 0 (pass49,da47)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 69 lun 0 (pass50,da48)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 73 lun 0 (pass51,da49)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 76 lun 0 (pass52,da50)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 77 lun 0 (pass53,da51)
(Tyan B7118 0500)                  at scbus4 target 80 lun 0 (pass54,ses2)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 81 lun 0 (pass55,da52)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 82 lun 0 (pass56,da53)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 83 lun 0 (pass57,da54)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 84 lun 0 (pass58,da55)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 85 lun 0 (pass59,da56)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 86 lun 0 (pass60,da57)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 87 lun 0 (pass61,da58)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 88 lun 0 (pass62,da59)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 89 lun 0 (da63,pass66)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 90 lun 0 (pass64,da61)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 91 lun 0 (pass65,da62)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 92 lun 0 (da60,pass63)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 94 lun 0 (pass67,da64)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 97 lun 0 (pass68,da65)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 98 lun 0 (pass69,da66)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 99 lun 0 (pass70,da67)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 100 lun 0 (pass71,da68)
(Tyan B7118 0500)                  at scbus4 target 101 lun 0 (pass72,ses3)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 102 lun 0 (pass73,da69)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 103 lun 0 (pass74,da70)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 104 lun 0 (pass75,da71)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 105 lun 0 (pass76,da72)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 106 lun 0 (pass77,da73)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 107 lun 0 (pass78,da74)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 108 lun 0 (pass79,da75)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 109 lun 0 (pass80,da76)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 110 lun 0 (pass81,da77)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 111 lun 0 (pass82,da78)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 112 lun 0 (pass83,da79)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 113 lun 0 (pass84,da80)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 114 lun 0 (pass85,da81)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 115 lun 0 (pass86,da82)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 116 lun 0 (pass87,da83)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 117 lun 0 (pass88,da84)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 118 lun 0 (pass89,da85)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 119 lun 0 (pass90,da86)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 120 lun 0 (pass91,da87)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 121 lun 0 (pass92,da88)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 122 lun 0 (pass93,da89)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 123 lun 0 (pass94,da90)
(ATA INTEL SSDSC2KB24 0100)        at scbus4 target 124 lun 0 (pass95,da91)
(ATA TOSHIBA MG07ACA1 0101)        at scbus4 target 125 lun 0 (da3,pass4)

One my ask how to identify which disk is which when the FAILURE will came … this is where FreeBSD’s sesutil(8) command comes handy.

# sesutil locate all off
# sesutil locate da64 on

The first sesutil(8) command disables all location lights in the enclosure. The second one turns on the identification for disk da64.

I will also make sure to NOT use the whole space of each drive. Such idea may be pointless but imagine the following situation. Five 12 TB disks failed after 3 years. You can not get the same model drives so you get other 12 TB drives, maybe even from other manufacturer.

# grep da64 /var/run/dmesg.boot
da64 at mpr1 bus 0 scbus4 target 93 lun 0
da64:  Fixed Direct Access SPC-4 SCSI device
da64: Serial Number 98G0A1EQF95G
da64: 1200.000MB/s transfers
da64: Command Queueing enabled
da64: 11444224MB (23437770752 512 byte sectors)

A single 12 TB drive has 23437770752 of 512 byte sectors which equals 12000138625024 bytes of raw capacity.

# expr 23437770752 \* 512
12000138625024

Now image that these other 12 TB drives from other manufacturer will come with 4 bytes smaller size … ZFS will not allow their usage because their size is smaller.

This is why I will use exactly 11175 GB size of each drive which is more or less 1 GB short of its total 11176 GB size.

Below is command that will do that for me for all 90 disks.

# camcontrol devlist \
    | grep TOSHIBA \
    | awk '{print $NF}' \
    | awk -F ',' '{print $2}' \
    | tr -d ')' \
    | while read DISK
      do
        gpart destroy -F                   ${DISK} 1> /dev/null 2> /dev/null
        gpart create -s GPT                ${DISK}
        gpart add -t freebsd-zfs -s 11175G ${DISK}
      done

# gpart show da64
=>         40  23437770672  da64  GPT  (11T)
           40  23435673600     1  freebsd-zfs  (11T)
  23435673640      2097072        - free -  (1.0G)


ZFS Pool Configuration

Next, we will have to create our ZFS pool, its probably the longest zpool command I have ever executed ๐Ÿ™‚

As the Toshiba 12 TB disks have 4k sectors we will need to set vfs.zfs.min_auto_ashift to 12 to force them.

# sysctl vfs.zfs.min_auto_ashift=12
vfs.zfs.min_auto_ashift: 12 -> 12

# zpool create nas02 \
    raidz2  da0p1  da1p1  da2p1  da3p1  da4p1  da5p1  da6p1  da7p1  da8p1  da9p1 da10p1 da12p1 \
    raidz2 da13p1 da14p1 da15p1 da16p1 da17p1 da18p1 da19p1 da20p1 da21p1 da22p1 da23p1 da24p1 \
    raidz2 da25p1 da26p1 da27p1 da28p1 da29p1 da30p1 da31p1 da32p1 da33p1 da34p1 da35p1 da36p1 \
    raidz2 da37p1 da38p1 da39p1 da40p1 da41p1 da42p1 da43p1 da44p1 da45p1 da46p1 da47p1 da48p1 \
    raidz2 da49p1 da50p1 da51p1 da52p1 da53p1 da54p1 da55p1 da56p1 da57p1 da58p1 da59p1 da60p1 \
    raidz2 da61p1 da62p1 da63p1 da64p1 da65p1 da66p1 da67p1 da68p1 da69p1 da70p1 da71p1 da72p1 \
    raidz2 da73p1 da74p1 da75p1 da76p1 da77p1 da78p1 da79p1 da80p1 da81p1 da82p1 da83p1 da84p1 \
    spare  da85p1 da86p1 da87p1 da88p1 da89p1 da90p1

# zpool status
  pool: nas02
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:05 with 0 errors on Fri May 31 10:26:29 2019
config:

        NAME        STATE     READ WRITE CKSUM
        nas02       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            da0p1   ONLINE       0     0     0
            da1p1   ONLINE       0     0     0
            da2p1   ONLINE       0     0     0
            da3p1   ONLINE       0     0     0
            da4p1   ONLINE       0     0     0
            da5p1   ONLINE       0     0     0
            da6p1   ONLINE       0     0     0
            da7p1   ONLINE       0     0     0
            da8p1   ONLINE       0     0     0
            da9p1   ONLINE       0     0     0
            da10p1  ONLINE       0     0     0
            da12p1  ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            da13p1  ONLINE       0     0     0
            da14p1  ONLINE       0     0     0
            da15p1  ONLINE       0     0     0
            da16p1  ONLINE       0     0     0
            da17p1  ONLINE       0     0     0
            da18p1  ONLINE       0     0     0
            da19p1  ONLINE       0     0     0
            da20p1  ONLINE       0     0     0
            da21p1  ONLINE       0     0     0
            da22p1  ONLINE       0     0     0
            da23p1  ONLINE       0     0     0
            da24p1  ONLINE       0     0     0
          raidz2-2  ONLINE       0     0     0
            da25p1  ONLINE       0     0     0
            da26p1  ONLINE       0     0     0
            da27p1  ONLINE       0     0     0
            da28p1  ONLINE       0     0     0
            da29p1  ONLINE       0     0     0
            da30p1  ONLINE       0     0     0
            da31p1  ONLINE       0     0     0
            da32p1  ONLINE       0     0     0
            da33p1  ONLINE       0     0     0
            da34p1  ONLINE       0     0     0
            da35p1  ONLINE       0     0     0
            da36p1  ONLINE       0     0     0
          raidz2-3  ONLINE       0     0     0
            da37p1  ONLINE       0     0     0
            da38p1  ONLINE       0     0     0
            da39p1  ONLINE       0     0     0
            da40p1  ONLINE       0     0     0
            da41p1  ONLINE       0     0     0
            da42p1  ONLINE       0     0     0
            da43p1  ONLINE       0     0     0
            da44p1  ONLINE       0     0     0
            da45p1  ONLINE       0     0     0
            da46p1  ONLINE       0     0     0
            da47p1  ONLINE       0     0     0
            da48p1  ONLINE       0     0     0
          raidz2-4  ONLINE       0     0     0
            da49p1  ONLINE       0     0     0
            da50p1  ONLINE       0     0     0
            da51p1  ONLINE       0     0     0
            da52p1  ONLINE       0     0     0
            da53p1  ONLINE       0     0     0
            da54p1  ONLINE       0     0     0
            da55p1  ONLINE       0     0     0
            da56p1  ONLINE       0     0     0
            da57p1  ONLINE       0     0     0
            da58p1  ONLINE       0     0     0
            da59p1  ONLINE       0     0     0
            da60p1  ONLINE       0     0     0
          raidz2-5  ONLINE       0     0     0
            da61p1  ONLINE       0     0     0
            da62p1  ONLINE       0     0     0
            da63p1  ONLINE       0     0     0
            da64p1  ONLINE       0     0     0
            da65p1  ONLINE       0     0     0
            da66p1  ONLINE       0     0     0
            da67p1  ONLINE       0     0     0
            da68p1  ONLINE       0     0     0
            da69p1  ONLINE       0     0     0
            da70p1  ONLINE       0     0     0
            da71p1  ONLINE       0     0     0
            da72p1  ONLINE       0     0     0
          raidz2-6  ONLINE       0     0     0
            da73p1  ONLINE       0     0     0
            da74p1  ONLINE       0     0     0
            da75p1  ONLINE       0     0     0
            da76p1  ONLINE       0     0     0
            da77p1  ONLINE       0     0     0
            da78p1  ONLINE       0     0     0
            da79p1  ONLINE       0     0     0
            da80p1  ONLINE       0     0     0
            da81p1  ONLINE       0     0     0
            da82p1  ONLINE       0     0     0
            da83p1  ONLINE       0     0     0
            da84p1  ONLINE       0     0     0
        spares
          da85p1    AVAIL
          da86p1    AVAIL
          da87p1    AVAIL
          da88p1    AVAIL
          da89p1    AVAIL
          da90p1    AVAIL

errors: No known data errors

# zpool list nas02
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
nas02   915T  1.42M   915T        -         -     0%     0%  1.00x  ONLINE  -

# zfs list nas02
NAME    USED  AVAIL  REFER  MOUNTPOINT
nas02    88K   675T   201K  none

ZFS Settings

As the primary role of this storage would be keeping files I will use one of the largest values for recordsize – 1 MB – this helps getting better compression ratio.

… but it will also serve as iSCSI Target in which we will try to fit in the native 4k blocks – thus 4096 bytes setting for iSCSI.

# zfs set compression=lz4         nas02
# zfs set atime=off               nas02
# zfs set mountpoint=none         nas02
# zfs set recordsize=1m           nas02
# zfs set redundant_metadata=most nas02
# zfs create                      nas02/nfs
# zfs create                      nas02/smb
# zfs create                      nas02/iscsi
# zfs set recordsize=4k           nas02/iscsi

Also one word on redundant_metadata as its not that obvious parameter. To quote the zfs(8) man page.

# man zfs
(...)
redundant_metadata=all | most
  Controls what types of metadata are stored redundantly.  ZFS stores
  an extra copy of metadata, so that if a single block is corrupted,
  the amount of user data lost is limited.  This extra copy is in
  addition to any redundancy provided at the pool level (e.g. by
  mirroring or RAID-Z), and is in addition to an extra copy specified
  by the copies property (up to a total of 3 copies).  For example if
  the pool is mirrored, copies=2, and redundant_metadata=most, then ZFS
  stores 6 copies of most metadata, and 4 copies of data and some
  metadata.

  When set to all, ZFS stores an extra copy of all metadata.  If a
  single on-disk block is corrupt, at worst a single block of user data
  (which is recordsize bytes long can be lost.)

  When set to most, ZFS stores an extra copy of most types of metadata.
  This can improve performance of random writes, because less metadata
  must be written.  In practice, at worst about 100 blocks (of
  recordsize bytes each) of user data can be lost if a single on-disk
  block is corrupt.  The exact behavior of which metadata blocks are
  stored redundantly may change in future releases.

  The default value is all.
(...)

From the description above we can see that its mostly useful on single device pools because when we have redundancy based on RAIDZ2 (RAID6 equivalent) we do not need to keep additional redundant copies of metadata. This helps to increase write performance.

For the record – iSCSI ZFS zvols are create with command like that one below – as sparse files – also called Thin Provisioning mode.

# zfs create -s -V 16T nas02/iscsi/test

As we have SPARE disks we will also need to enable the zfsd(8) daemon by adding zfsd_enable=YES to the /etc/rc.conf file.

We also need to enable autoreplace property for our pool because by default its set to off.

# zpool get autoreplace nas02
NAME   PROPERTY     VALUE    SOURCE
nas02  autoreplace  off      default

# zpool set autoreplace=on nas02

# zpool get autoreplace nas02
NAME   PROPERTY     VALUE    SOURCE
nas02  autoreplace  on       local

Other ZFS settings are in the /boot/loader.conf file. As this system has 128 GB RAM we will let ZFS use 50 to 75% of that amount for ARC.

# grep vfs.zfs /boot/loader.conf
  vfs.zfs.prefetch_disable=1
  vfs.zfs.cache_flush_disable=1
  vfs.zfs.vdev.cache.size=16M
  vfs.zfs.arc_min=64G
  vfs.zfs.arc_max=96G
  vfs.zfs.deadman_enabled=0

Network Configuration

This is what I really like about FreeBSD. To setup LACP link aggregation tou just need 5 lines in /etc/rc.conf file. On Red Hat Enterprise Linux you would need several files with many lines each.

# head -5 /etc/rc.conf
  defaultrouter="10.20.30.254"
  ifconfig_ixl0="up"
  ifconfig_ixl1="up"
  cloned_interfaces="lagg0"
  ifconfig_lagg0="laggproto lacp laggport ixl0 laggport ixl1 10.20.30.2/24 up"

# ifconfig lagg0
lagg0: flags=8843 metric 0 mtu 1500
        options=e507bb
        ether a0:42:3f:a0:42:3f
        inet 10.20.30.2 netmask 0xffffff00 broadcast 10.20.30.255
        laggproto lacp lagghash l2,l3,l4
        laggport: ixl0 flags=1c
        laggport: ixl1 flags=1c
        groups: lagg
        media: Ethernet autoselect
        status: active
        nd6 options=29

The Intel X710 DA-2 10GE network adapter is fully supported under FreeBSD by the ixl(4) driver.

intel-x710-da-2.jpg

Cisco Nexus Configuration

This is the Cisco Nexus configuration needed to enable LACP aggregation.

First the ports.

NEXUS-1  Eth1/32  NAS02_IXL0  connected 3  full  a-10G  SFP-H10GB-A
NEXUS-2  Eth1/32  NAS02_IXL1  connected 3  full  a-10G  SFP-H10GB-A

… and now aggregation.

interface Ethernet1/32
  description NAS02_IXL1
  switchport
  switchport access vlan 3
  mtu 9216
  channel-group 128 mode active
  no shutdown
!
interface port-channel128
  description NAS02
  switchport
  switchport access vlan 3
  mtu 9216
  vpc 128

… and the same/similar on the second Cisco Nexus NEXUS-2 switch.

FreeBSD Configuration

These are three most important configuration files on any FreeBSD system.

I will now post all settings I use on this storage system.

The /etc/rc.conf file.

# cat /etc/rc.conf
# NETWORK
  hostname="nas02.local"
  defaultrouter="10.20.30.254"
  ifconfig_ixl0="up"
  ifconfig_ixl1="up"
  cloned_interfaces="lagg0"
  ifconfig_lagg0="laggproto lacp laggport ixl0 laggport ixl1 10.20.30.2/24 up"

# KERNEL MODULES
  kld_list="${kld_list} aesni"

# DAEMON | YES
  zfs_enable=YES
  zfsd_enable=YES
  sshd_enable=YES
  ctld_enable=YES
  powerd_enable=YES

# DAEMON | NFS SERVER
  nfs_server_enable=YES
  nfs_client_enable=YES
  rpc_lockd_enable=YES
  rpc_statd_enable=YES
  rpcbind_enable=YES
  mountd_enable=YES
  mountd_flags="-r"

# OTHER
  dumpdev=NO

The /boot/loader.conf file.

# cat /boot/loader.conf
# BOOT OPTIONS
  autoboot_delay=3
  kern.geom.label.disk_ident.enable=0
  kern.geom.label.gptid.enable=0

# DISABLE INTEL HT
  machdep.hyperthreading_allowed=0

# UPDATE INTEL CPU MICROCODE AT BOOT BEFORE KERNEL IS LOADED
  cpu_microcode_load=YES
  cpu_microcode_name=/boot/firmware/intel-ucode.bin

# MODULES
  zfs_load=YES
  aio_load=YES

# RACCT/RCTL RESOURCE LIMITS
  kern.racct.enable=1

# DISABLE MEMORY TEST @ BOOT
  hw.memtest.tests=0

# PIPE KVA LIMIT | 320 MB
  kern.ipc.maxpipekva=335544320

# IPC
  kern.ipc.shmseg=1024
  kern.ipc.shmmni=1024
  kern.ipc.shmseg=1024
  kern.ipc.semmns=512
  kern.ipc.semmnu=256
  kern.ipc.semume=256
  kern.ipc.semopm=256
  kern.ipc.semmsl=512

# LARGE PAGE MAPPINGS
  vm.pmap.pg_ps_enabled=1

# ZFS TUNING
  vfs.zfs.prefetch_disable=1
  vfs.zfs.cache_flush_disable=1
  vfs.zfs.vdev.cache.size=16M
  vfs.zfs.arc_min=64G
  vfs.zfs.arc_max=96G

# ZFS DISABLE PANIC ON STALE I/O
  vfs.zfs.deadman_enabled=0

# NEWCONS SUSPEND
  kern.vt.suspendswitch=0

The /etc/sysctl.conf file.

# cat /etc/sysctl.conf
# ZFS ASHIFT
  vfs.zfs.min_auto_ashift=12

# SECURITY
  security.bsd.stack_guard_page=1

# SECURITY INTEL MDS (MICROARCHITECTURAL DATA SAMPLING) MITIGATION
  hw.mds_disable=3

# DISABLE ANNOYING THINGS
  kern.coredump=0
  hw.syscons.bell=0

# IPC
  kern.ipc.shmmax=4294967296
  kern.ipc.shmall=2097152
  kern.ipc.somaxconn=4096
  kern.ipc.maxsockbuf=5242880
  kern.ipc.shm_allow_removed=1

# NETWORK
  kern.ipc.maxsockbuf=16777216
  kern.ipc.soacceptqueue=1024
  net.inet.tcp.recvbuf_max=8388608
  net.inet.tcp.sendbuf_max=8388608
  net.inet.tcp.mssdflt=1460
  net.inet.tcp.minmss=1300
  net.inet.tcp.syncache.rexmtlimit=0
  net.inet.tcp.syncookies=0
  net.inet.tcp.tso=0
  net.inet.ip.process_options=0
  net.inet.ip.random_id=1
  net.inet.ip.redirect=0
  net.inet.icmp.drop_redirect=1
  net.inet.tcp.always_keepalive=0
  net.inet.tcp.drop_synfin=1
  net.inet.tcp.fast_finwait2_recycle=1
  net.inet.tcp.icmp_may_rst=0
  net.inet.tcp.msl=8192
  net.inet.tcp.path_mtu_discovery=0
  net.inet.udp.blackhole=1
  net.inet.tcp.blackhole=2
  net.inet.tcp.hostcache.expire=7200
  net.inet.tcp.delacktime=20

Purpose

Why one would built such appliance? Because its a lot cheaper then to get the ‘branded’ one. Think about Dell EMC Data Domain for example – and not just ‘any’ Data Domain but almost the highest one – the Data Domain DD9300 at least. It would cost about ten times more at least … with smaller capacity and taking not 4U but closer to 14U with three DS60 expanders.

But you can actually make this FreeBSD Enterprise Storage behave like Dell EMC Data Domain .. or like their Dell EMC Elastic Cloud Storage for example.

The Dell EMC CloudBoost can be deployed somewhere on your VMware stack to provide the DDBoost deduplication. Then you would need OpenStack Swift as its one of the supported backed devices.

emc-cloudboost-swift-cover.png

emc-cloudboost-swift-support.png

The OpenStack Swift package in FreeBSD is about 4-5 years behind reality (2.2.2) so you will have to use Bhyve here.

# pkg search swift
(...)
py27-swift-2.2.2_1             Highly available, distributed, eventually consistent object/blob store
(...)

Create Bhyve virtual machine on this FreeBSD Enterprise Storage with CentOS 7.6 system for example, then setup Swift there, but it will work. With 20 physical cores to spare and 128 GB RAM you would not even noticed its there.

This way you can use Dell EMC Networker with more then ten times cheaper storage.

In the past I also wrote about IBM Spectrum Protect (TSM) which would also greatly benefit from FreeBSD Enterprise Storage. I actually also use this FreeBSD based storage as space for IBM Spectrum Protect (TSM) container pool directories. Exported via iSCSI works like a charm.

You can also compare that FreeBSD Enterprise Storage to other storage appliances like iXsystems TrueNAS or EXAGRID.

Performance

You for sure would want to know how fast this FreeBSD Enterprise Storage performs ๐Ÿ™‚

I will share all performance data that I gathered with a pleasure.

Network Performance

First the network performance.

I user iperf3 as the benchmark.

I started the server on the FreeBSD side.

# iperf3 -s

… and then I started client on the Windows Server 2016 machine.

C:\iperf-3.1.3-win64>iperf3.exe -c nas02 -P 8
(...)
[SUM]   0.00-10.00  sec  10.8 GBytes  9.26 Gbits/sec                  receiver
(..)

This is with MTU 1500 – no Jumbo frames unfortunatelly ๐Ÿ˜ฆ

Unfortunatelly this system has only one physical 10GE interface but I did other test also. Using two such boxes with single 10GE interface. That saturated the dual 10GE LACP on FreeBSD side nicely.

I also exported NFS and iSCSI to Red Hat Enterprise Linux system. The network performance was about 500-600 MB/s on single 10GE interface. That would be 1000-1200 MB/s on LACP aggregation.

Disk Subsystem Performance

Now the disk subsystem.

First some naive test using diskinfo(8) FreeBSD’s builtin tool.

# diskinfo -ctv /dev/da12
/dev/da12
        512             # sectorsize
        12000138625024  # mediasize in bytes (11T)
        23437770752     # mediasize in sectors
        4096            # stripesize
        0               # stripeoffset
        1458933         # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
        ATA TOSHIBA MG07ACA1    # Disk descr.
        98H0A11KF95G    # Disk ident.
        id1,enc@n500e081010445dbd/type@0/slot@c/elmdesc@ArrayDevice11   # Physical path
        No              # TRIM/UNMAP support
        7200            # Rotation rate in RPM
        Not_Zoned       # Zone Mode

I/O command overhead:
        time to read 10MB block      0.067031 sec       =    0.003 msec/sector
        time to read 20480 sectors   2.619989 sec       =    0.128 msec/sector
        calculated command overhead                     =    0.125 msec/sector

Seek times:
        Full stroke:      250 iter in   5.665880 sec =   22.664 msec
        Half stroke:      250 iter in   4.263047 sec =   17.052 msec
        Quarter stroke:   500 iter in   6.867914 sec =   13.736 msec
        Short forward:    400 iter in   3.057913 sec =    7.645 msec
        Short backward:   400 iter in   1.979287 sec =    4.948 msec
        Seq outer:       2048 iter in   0.169472 sec =    0.083 msec
        Seq inner:       2048 iter in   0.469630 sec =    0.229 msec

Transfer rates:
        outside:       102400 kbytes in   0.478251 sec =   214114 kbytes/sec
        middle:        102400 kbytes in   0.605701 sec =   169060 kbytes/sec
        inside:        102400 kbytes in   1.303909 sec =    78533 kbytes/sec

So now we know how fast a single disk is.

Let’s repeast the same test on the ZFS zvol device.

# diskinfo -ctv /dev/zvol/nas02/iscsi/test
/dev/zvol/nas02/iscsi/test
        512             # sectorsize
        17592186044416  # mediasize in bytes (16T)
        34359738368     # mediasize in sectors
        65536           # stripesize
        0               # stripeoffset
        Yes             # TRIM/UNMAP support
        Unknown         # Rotation rate in RPM

I/O command overhead:
        time to read 10MB block      0.004512 sec       =    0.000 msec/sector
        time to read 20480 sectors   0.196824 sec       =    0.010 msec/sector
        calculated command overhead                     =    0.009 msec/sector

Seek times:
        Full stroke:      250 iter in   0.006151 sec =    0.025 msec
        Half stroke:      250 iter in   0.008228 sec =    0.033 msec
        Quarter stroke:   500 iter in   0.014062 sec =    0.028 msec
        Short forward:    400 iter in   0.010564 sec =    0.026 msec
        Short backward:   400 iter in   0.011725 sec =    0.029 msec
        Seq outer:       2048 iter in   0.028198 sec =    0.014 msec
        Seq inner:       2048 iter in   0.028416 sec =    0.014 msec

Transfer rates:
        outside:       102400 kbytes in   0.036938 sec =  2772213 kbytes/sec
        middle:        102400 kbytes in   0.043076 sec =  2377194 kbytes/sec
        inside:        102400 kbytes in   0.034260 sec =  2988908 kbytes/sec

Almost 3 GB/s – not bad.

Time for even more oldschool test – the immortal dd(8) command.

This is with compression=off setting.

One process.

# dd if=/dev/zero of=FILE bs=128m status=progress
26172456960 bytes (26 GB, 24 GiB) transferred 16.074s, 1628 MB/s
202+0 records in
201+0 records out
26977763328 bytes transferred in 16.660884 secs (1619227644 bytes/sec)

Four concurrent processes.

# dd if=/dev/zero of=FILE${X} bs=128m status=progress
80933289984 bytes (81 GB, 75 GiB) transferred 98.081s, 825 MB/s
608+0 records in
608+0 records out
81604378624 bytes transferred in 98.990579 secs (824365101 bytes/sec)

Eight concurrent processes.

# dd if=/dev/zero of=FILE${X} bs=128m status=progress
174214610944 bytes (174 GB, 162 GiB) transferred 385.042s, 452 MB/s
1302+0 records in
1301+0 records out
174617264128 bytes transferred in 385.379296 secs (453104943 bytes/sec)

Lets summarize that data.

1 STREAM(s) ~ 1600 MB/s ~ 1.5 GB/s
4 STREAM(s) ~ 3300 MB/s ~ 3.2 GB/s
8 STREAM(s) ~ 3600 MB/s ~ 3.5 GB/s

So the disk subsystem is able to squeeze 3.5 GB/s of sustained speed in sequential writes. That us that if we would want to saturate it we would need to add additional two 10GE interfaces.

The disks were stressed only to about 55% which you can see in other useful FreeBSD tool – gstat(8) command.

n10.png

Time for more ‘intelligent’ tests. The blogbench test.

First with compression disabled.

# time blogbench -d .
Frequency = 10 secs
Scratch dir = [.]
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 30 iterations.
The test will run during 5 minutes.
(...)
Final score for writes:          6476
Final score for reads :        660436

blogbench -d .  280.58s user 4974.41s system 1748% cpu 5:00.54 total

Second with compression set to LZ4.

# time blogbench -d .
Frequency = 10 secs
Scratch dir = [.]
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 30 iterations.
The test will run during 5 minutes.
(...)
Final score for writes:          7087
Final score for reads :        733932

blogbench -d .  299.08s user 5415.04s system 1900% cpu 5:00.68 total

Compression did not helped much, but helped.

To have some comparision we will run the same test on the system ZFS pool – two Intel SSD DC S3500 240 GB drives in mirror which have following features.

The Intel SSD DC S3500 240 GB drives:

  • Sequential Read (up to) 500 MB/s
  • Sequential Write (up to) 260 MB/s
  • Random Read (100% Span) 75000 IOPS
  • Random Write (100% Span) 7500 IOPS
# time blogbench -d .
Frequency = 10 secs
Scratch dir = [.]
Spawning 3 writers...
Spawning 1 rewriters...
Spawning 5 commenters...
Spawning 100 readers...
Benchmarking for 30 iterations.
The test will run during 5 minutes.
(...)
Final score for writes:          6109
Final score for reads :        654099

blogbench -d .  278.73s user 5058.75s system 1777% cpu 5:00.30 total

Now the randomio test. Its multithreaded disk I/O microbenchmark.

The usage is as follows.

usage: randomio filename nr_threads write_fraction_of_io fsync_fraction_of_writes io_size nr_seconds_between_samples

filename                    Filename or device to read/write.
write_fraction_of_io        What fraction of I/O should be writes - for example 0.25 for 25% write.
fsync_fraction_of_writes    What fraction of writes should be fsync'd.
io_size                     How many bytes to read/write (multiple of 512 bytes).
nr_seconds_between_samples  How many seconds to average samples over.

The randomio with 4k block.

# zfs create -s -V 1T nas02/iscsi/test
# randomio /dev/zvol/nas02/iscsi/test 8 0.25 1 4096 10
  total |  read:         latency (ms)       |  write:        latency (ms)
   iops |   iops   min    avg    max   sdev |   iops   min    avg    max   sdev
--------+-----------------------------------+----------------------------------
54137.7 |40648.4   0.0    0.1  575.8    2.2 |13489.4   0.0    0.3  405.8    2.6
66248.4 |49641.5   0.0    0.1   19.6    0.3 |16606.9   0.0    0.2   26.4    0.7
66411.0 |49817.2   0.0    0.1   19.7    0.3 |16593.8   0.0    0.2   20.3    0.7
64158.9 |48142.8   0.0    0.1  254.7    0.7 |16016.1   0.0    0.2  130.4    1.0
48454.1 |36390.8   0.0    0.1  542.8    2.7 |12063.3   0.0    0.3  507.5    3.2
66796.1 |50067.4   0.0    0.1   24.1    0.3 |16728.7   0.0    0.2   23.4    0.7
58512.2 |43851.7   0.0    0.1  576.5    1.7 |14660.5   0.0    0.2  307.2    1.7
63195.8 |47341.8   0.0    0.1  261.6    0.9 |15854.1   0.0    0.2  361.1    1.9
67086.0 |50335.6   0.0    0.1   20.4    0.3 |16750.4   0.0    0.2   25.1    0.8
67429.8 |50549.6   0.0    0.1   21.8    0.3 |16880.3   0.0    0.2   20.6    0.7
^C

… and with 512 sector.

# zfs create -s -V 1T nas02/iscsi/test
# randomio /dev/zvol/nas02/iscsi/TEST 8 0.25 1 512 10
  total |  read:         latency (ms)       |  write:        latency (ms)
   iops |   iops   min    avg    max   sdev |   iops   min    avg    max   sdev
--------+-----------------------------------+----------------------------------
58218.9 |43712.0   0.0    0.1  501.5    2.1 |14506.9   0.0    0.2  272.5    1.6
66325.3 |49703.8   0.0    0.1  352.0    0.9 |16621.4   0.0    0.2  352.0    1.5
68130.5 |51100.8   0.0    0.1   24.6    0.3 |17029.7   0.0    0.2   24.4    0.7
68465.3 |51352.3   0.0    0.1   19.9    0.3 |17112.9   0.0    0.2   23.8    0.7
54903.5 |41249.1   0.0    0.1  399.3    1.9 |13654.4   0.0    0.3  335.8    2.2
61259.8 |45898.7   0.0    0.1  574.6    1.7 |15361.0   0.0    0.2  371.5    1.7
68483.3 |51313.1   0.0    0.1   22.9    0.3 |17170.3   0.0    0.2   26.1    0.7
56713.7 |42524.7   0.0    0.1  373.5    1.8 |14189.1   0.0    0.2  438.5    2.7
68861.4 |51657.0   0.0    0.1   21.0    0.3 |17204.3   0.0    0.2   21.7    0.7
68602.0 |51438.4   0.0    0.1   19.5    0.3 |17163.7   0.0    0.2   23.7    0.7
^C

Both randomio tests were run with compression set to LZ4.

Next is bonnie++ benchmark. It has been run with compression set to LZ4.

# bonnie++ -d . -u root
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
nas02.local 261368M   139  99 775132  99 589190  99   383  99 1638929  99 12930 2046
Latency             60266us    7030us    7059us   21553us    3844us    5710us
Version  1.97       ------Sequential Create------ --------Random Create--------
nas02.local         -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ 12680  44 +++++ +++ +++++ +++ 30049  99
Latency              2619us      43us     714ms    2748us      28us      58us

… and last but not least the fio benchmark. Also with LZ4 compression enabled.

# fio --randrepeat=1 --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=64
fio-3.13
Starting 1 process
Jobs: 1 (f=1): [m(1)][98.0%][r=38.0MiB/s,w=12.2MiB/s][r=9735,w=3128 IOPS][eta 00m:05s]
test: (groupid=0, jobs=1): err= 0: pid=35368: Tue Jun 18 15:14:44 2019
  read: IOPS=3157, BW=12.3MiB/s (12.9MB/s)(3070MiB/248872msec)
   bw (  KiB/s): min= 9404, max=57732, per=98.72%, avg=12469.84, stdev=3082.99, samples=497
   iops        : min= 2351, max=14433, avg=3117.15, stdev=770.74, samples=497
  write: IOPS=1055, BW=4222KiB/s (4323kB/s)(1026MiB/248872msec)
   bw (  KiB/s): min= 3179, max=18914, per=98.71%, avg=4166.60, stdev=999.23, samples=497
   iops        : min=  794, max= 4728, avg=1041.25, stdev=249.76, samples=497
  cpu          : usr=1.11%, sys=88.64%, ctx=677981, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=12.3MiB/s (12.9MB/s), 12.3MiB/s-12.3MiB/s (12.9MB/s-12.9MB/s), io=3070MiB (3219MB), run=248872-248872msec
  WRITE: bw=4222KiB/s (4323kB/s), 4222KiB/s-4222KiB/s (4323kB/s-4323kB/s), io=1026MiB (1076MB), run=248872-248872msec

Dunno how about you but I am satisfied with performance ๐Ÿ™‚

FreeNAS

Originally I really wanted to use FreeNAS on these boxes and I even installed FreeNAS on them. It run nicely but … the security part of FreeNAS was not best.

This is the output of pkg audit command. Quite scarry.

root@freenas[~]# pkg audit -F
Fetching vuln.xml.bz2: 100%  785 KiB 804.3kB/s    00:01
python27-2.7.15 is vulnerable:
Python -- NULL pointer dereference vulnerability
CVE: CVE-2019-5010
WWW: https://vuxml.FreeBSD.org/freebsd/d74371d2-4fee-11e9-a5cd-1df8a848de3d.html

curl-7.62.0 is vulnerable:
curl -- multiple vulnerabilities
CVE: CVE-2019-3823
CVE: CVE-2019-3822
CVE: CVE-2018-16890
WWW: https://vuxml.FreeBSD.org/freebsd/714b033a-2b09-11e9-8bc3-610fd6e6cd05.html

libgcrypt-1.8.2 is vulnerable:
libgcrypt -- side-channel attack vulnerability
CVE: CVE-2018-0495
WWW: https://vuxml.FreeBSD.org/freebsd/9b5162de-6f39-11e8-818e-e8e0b747a45a.html

python36-3.6.5_1 is vulnerable:
Python -- NULL pointer dereference vulnerability
CVE: CVE-2019-5010
WWW: https://vuxml.FreeBSD.org/freebsd/d74371d2-4fee-11e9-a5cd-1df8a848de3d.html

pango-1.42.0 is vulnerable:
pango -- remote DoS vulnerability
CVE: CVE-2018-15120
WWW: https://vuxml.FreeBSD.org/freebsd/5a757a31-f98e-4bd4-8a85-f1c0f3409769.html

py36-requests-2.18.4 is vulnerable:
www/py-requests -- Information disclosure vulnerability
WWW: https://vuxml.FreeBSD.org/freebsd/50ad9a9a-1e28-11e9-98d7-0050562a4d7b.html

libnghttp2-1.31.0 is vulnerable:
nghttp2 -- Denial of service due to NULL pointer dereference
CVE: CVE-2018-1000168
WWW: https://vuxml.FreeBSD.org/freebsd/1fccb25e-8451-438c-a2b9-6a021e4d7a31.html

gnupg-2.2.6 is vulnerable:
gnupg -- unsanitized output (CVE-2018-12020)
CVE: CVE-2017-7526
CVE: CVE-2018-12020
WWW: https://vuxml.FreeBSD.org/freebsd/7da0417f-6b24-11e8-84cc-002590acae31.html

py36-cryptography-2.1.4 is vulnerable:
py-cryptography -- tag forgery vulnerability
CVE: CVE-2018-10903
WWW: https://vuxml.FreeBSD.org/freebsd/9e2d0dcf-9926-11e8-a92d-0050562a4d7b.html

perl5-5.26.1 is vulnerable:
perl -- multiple vulnerabilities
CVE: CVE-2018-6913
CVE: CVE-2018-6798
CVE: CVE-2018-6797
WWW: https://vuxml.FreeBSD.org/freebsd/41c96ffd-29a6-4dcc-9a88-65f5038fa6eb.html

libssh2-1.8.0,3 is vulnerable:
libssh2 -- multiple issues
CVE: CVE-2019-3862
CVE: CVE-2019-3861
CVE: CVE-2019-3860
CVE: CVE-2019-3858
WWW: https://vuxml.FreeBSD.org/freebsd/6e58e1e9-2636-413e-9f84-4c0e21143628.html

git-lite-2.17.0 is vulnerable:
Git -- Fix memory out-of-bounds and remote code execution vulnerabilities (CVE-2018-11233 and CVE-2018-11235)
CVE: CVE-2018-11235
CVE: CVE-2018-11233
WWW: https://vuxml.FreeBSD.org/freebsd/c7a135f4-66a4-11e8-9e63-3085a9a47796.html

gnutls-3.5.18 is vulnerable:
GnuTLS -- double free, invalid pointer access
CVE: CVE-2019-3836
CVE: CVE-2019-3829
WWW: https://vuxml.FreeBSD.org/freebsd/fb30db8f-62af-11e9-b0de-001cc0382b2f.html

13 problem(s) in the installed packages found.

root@freenas[~]# uname -a
FreeBSD freenas.local 11.2-STABLE FreeBSD 11.2-STABLE #0 r325575+95cc58ca2a0(HEAD): Mon May  6 19:08:58 EDT 2019     root@mp20.tn.ixsystems.com:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/FreeNAS.amd64  amd64

root@freenas[~]# freebsd-version -uk
11.2-STABLE
11.2-STABLE

root@freenas[~]# sockstat -l4
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
root     uwsgi-3.6  4006  3  tcp4   127.0.0.1:9042        *:*
root     uwsgi-3.6  3188  3  tcp4   127.0.0.1:9042        *:*
nobody   mdnsd      3144  4  udp4   *:31417               *:*
nobody   mdnsd      3144  6  udp4   *:5353                *:*
www      nginx      3132  6  tcp4   *:443                 *:*
www      nginx      3132  8  tcp4   *:80                  *:*
root     nginx      3131  6  tcp4   *:443                 *:*
root     nginx      3131  8  tcp4   *:80                  *:*
root     ntpd       2823  21 udp4   *:123                 *:*
root     ntpd       2823  22 udp4   10.49.13.99:123       *:*
root     ntpd       2823  25 udp4   127.0.0.1:123         *:*
root     sshd       2743  5  tcp4   *:22                  *:*
root     syslog-ng  2341  19 udp4   *:1031                *:*
nobody   mdnsd      2134  3  udp4   *:39020               *:*
nobody   mdnsd      2134  5  udp4   *:5353                *:*
root     python3.6  236   22 tcp4   *:6000                *:*


I even tried to get explanation why FreeNAS has such outdated and insecure packages in their latest version – FreeNAS 11.2-U3 Vulnerabilities – a thread I started on their forums.

Unfortunatelly its their policy which you can summarize as ‘do not touch/change versions if its working’ – at last I got this implression.

Because if these security holes I can not recommend the use of FreeNAS and I movedto original – the FreeBSD system.

One other interesting note. After I installed FreeBSD I wanted to import the ZFS pool created by FreeNAS. This is what I got after executing the zpool import command.

# zpool import
   pool: nas02_gr06
     id: 1275660523517109367
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr06  ONLINE
          raidz2-0  ONLINE
            da58p2  ONLINE
            da59p2  ONLINE
            da60p2  ONLINE
            da61p2  ONLINE
            da62p2  ONLINE
            da63p2  ONLINE
            da64p2  ONLINE
            da26p2  ONLINE
            da65p2  ONLINE
            da23p2  ONLINE
            da29p2  ONLINE
            da66p2  ONLINE
            da67p2  ONLINE
            da68p2  ONLINE
        spares
          da69p2

   pool: nas02_gr05
     id: 5642709896812665361
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr05  ONLINE
          raidz2-0  ONLINE
            da20p2  ONLINE
            da30p2  ONLINE
            da34p2  ONLINE
            da50p2  ONLINE
            da28p2  ONLINE
            da38p2  ONLINE
            da51p2  ONLINE
            da52p2  ONLINE
            da27p2  ONLINE
            da32p2  ONLINE
            da53p2  ONLINE
            da54p2  ONLINE
            da55p2  ONLINE
            da56p2  ONLINE
        spares
          da57p2

   pool: nas02_gr04
     id: 2460983830075205166
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr04  ONLINE
          raidz2-0  ONLINE
            da44p2  ONLINE
            da37p2  ONLINE
            da18p2  ONLINE
            da36p2  ONLINE
            da45p2  ONLINE
            da19p2  ONLINE
            da22p2  ONLINE
            da33p2  ONLINE
            da35p2  ONLINE
            da21p2  ONLINE
            da31p2  ONLINE
            da47p2  ONLINE
            da48p2  ONLINE
            da49p2  ONLINE
        spares
          da46p2

   pool: nas02_gr03
     id: 4878868173820164207
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr03  ONLINE
          raidz2-0  ONLINE
            da81p2  ONLINE
            da71p2  ONLINE
            da14p2  ONLINE
            da15p2  ONLINE
            da80p2  ONLINE
            da16p2  ONLINE
            da88p2  ONLINE
            da17p2  ONLINE
            da40p2  ONLINE
            da41p2  ONLINE
            da25p2  ONLINE
            da42p2  ONLINE
            da24p2  ONLINE
            da43p2  ONLINE
        spares
          da39p2

   pool: nas02_gr02
     id: 3299037437134217744
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr02  ONLINE
          raidz2-0  ONLINE
            da84p2  ONLINE
            da76p2  ONLINE
            da85p2  ONLINE
            da8p2   ONLINE
            da9p2   ONLINE
            da78p2  ONLINE
            da73p2  ONLINE
            da74p2  ONLINE
            da70p2  ONLINE
            da77p2  ONLINE
            da11p2  ONLINE
            da13p2  ONLINE
            da79p2  ONLINE
            da89p2  ONLINE
        spares
          da90p2

   pool: nas02_gr01
     id: 1132383125952900182
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

        nas02_gr01  ONLINE
          raidz2-0  ONLINE
            da91p2  ONLINE
            da75p2  ONLINE
            da0p2   ONLINE
            da82p2  ONLINE
            da1p2   ONLINE
            da83p2  ONLINE
            da2p2   ONLINE
            da3p2   ONLINE
            da4p2   ONLINE
            da5p2   ONLINE
            da86p2  ONLINE
            da6p2   ONLINE
            da7p2   ONLINE
            da72p2  ONLINE
        spares
          da87p2



It seems that FreeNAS does ZFS little differently and they create a separate pool for every RAIDZ2 target with dedicated spares. Interesting …

UPDATE 1 – BSD Now 305

The FreeBSD Enterprise 1 PB Storage article was featured in the BSD Now 305 – Changing Face of Unix episode.

Thanks for mentioning!

UPDATE 2 – Real Life Pictures in Data Center

Some of you asked for a real life pictures of this monster. Below you will find several pics taken at the data center.

Front case with cabling.

tyan-real-01.jpg

Alternate front view.

tyan-real-09.jpg

Back of the case with cabling.

tyan-real-02.jpg

Top view with disks.

tyan-real-03

Alternate top view.

tyan-real-07.jpg

Disks slots zoom.

tyan-real-08.jpg

SSD and HDD disks.

tyan-real-06.jpg

EOF