FreeBSD Cluster with Pacemaker and Corosync

I always missed ‘proper’ cluster software for FreeBSD systems. Recently I got to run several Pacemaker/Corosync based clusters on Linux systems. I thought how to make similar high availability solutions on FreeBSD and I was really shocked when I figured out that both Pacemaker and Corosync tools are available in the FreeBSD Ports and packages as net/pacemaker2 and net/corosync2 respectively.

In this article I will check how well Pacemaker and Corosync cluster works on FreeBSD.

pacemaker

There are many definitions of a cluster. One that I like the most is that a cluster is a system that is still redundant after losing one of its nodes (is still a cluster). This means that 3 nodes is a minimum for a cluster by that definition. The two node clusters are quite problematic because of their biggest exposure to the split brain problem. That is why often in the two node clusters additional devices or systems are added to make sure that this split brain does not happen. For example one can add third node without any resources or services just as a ‘witness’ role. Other way is to add a shared disk resource that will serve the same purpose and often its a raw volume with SCSI-3 Persistent Reservation mechanism used.

Lab Setup

As usual it will be entirely VirtualBox based and it will consist of 3 hosts. To not create 3 same FreeBSD installations I used 12.1-RELEASE virtual machine image available from the FreeBSD Project directly:

There are several formats available – qcow2/raw/vhd/vmdk – but as I will be using VirtualBox I used the VMDK one.

Here is the list of the machines for the GlusterFS cluster:

  • 10.0.10.111 node1
  • 10.0.10.112 node2
  • 10.0.10.113 node3

Each VirtualBox virtual machine for FreeBSD is the default one (as suggested in the VirtualBox wizard) with 512 MB RAM and NAT Network as shown on the image below.

machine

Here is the configuration of the NAT Network on VirtualBox.

nat-network-01

nat-network-02

Before we will try connect to our FreeBSD machines we need to make the minimal network configuration inside each VM. Each FreeBSD machine will have such minimal /etc/rc.conf file as shown example for node1 host.

root@node1:~ # cat /etc/rc.conf
hostname=node1
ifconfig_em0="inet 10.0.10.111/24 up"
defaultrouter=10.0.10.1
sshd_enable=YES

For the setup purposes we will need to allow root login on these FreeBSD machines with PermitRootLogin yes option in the /etc/ssh/sshd_config file. You will also need to restart the sshd(8) service after the changes.

root@node1:~ # grep PermitRootLogin /etc/ssh/sshd_config
PermitRootLogin yes

root@node1:~ # service sshd restart

By using NAT Network with Port Forwarding the FreeBSD machines will be accessible on the localhost ports. For example the node1 machine will be available on port 2211, the node2 machine will be available on port 2212 and so on. This is shown in the sockstat utility output below.

nat-network-03-sockstat

nat-network-04-ssh

To connect to such machine from the VirtualBox host system you will need this command:

vboxhost % ssh -l root localhost -p 2211

Packages

As we now have ssh(1) connectivity we need to add needed packages. To make our VMs resolve DNS queries we need to add one last thing. We will also switch to ‘quarterly’ branch of the pkg(8) packages.

root@node1:~ # echo 'nameserver 1.1.1.1' > /etc/resolv.conf
root@node1:~ # sed -i '' s/quarterly/latest/g /etc/pkg/FreeBSD.conf

Remember to repeat these two upper commands on node2 and node3 systems.

Now we will add Pacemaker and Corosync packages.

root@node1:~ # pkg install pacemaker2 corosync2 crmsh

root@node2:~ # pkg install pacemaker2 corosync2 crmsh

root@node3:~ # pkg install pacemaker2 corosync2 crmsh

These are messages both from pacemaker2 and corosync2 that we need to address.

Message from pacemaker2-2.0.4:

--
For correct operation, maximum socket buffer size must be tuned
by performing the following command as root :

# sysctl kern.ipc.maxsockbuf=18874368

To preserve this setting across reboots, append the following
to /etc/sysctl.conf :

kern.ipc.maxsockbuf=18874368

======================================================================

Message from corosync2-2.4.5_1:

--
For correct operation, maximum socket buffer size must be tuned
by performing the following command as root :

# sysctl kern.ipc.maxsockbuf=18874368

To preserve this setting across reboots, append the following
to /etc/sysctl.conf :

kern.ipc.maxsockbuf=18874368

We need to change the kern.ipc.maxsockbuf parameter. Lets do it then.

root@node1:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node1:~ # service sysctl restart

root@node2:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node2:~ # service sysctl restart

root@node3:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node3:~ # service sysctl restart

Lets check what binaries come with these packages.

root@node1:~ # pkg info -l pacemaker2 | grep bin
        /usr/local/sbin/attrd_updater
        /usr/local/sbin/cibadmin
        /usr/local/sbin/crm_attribute
        /usr/local/sbin/crm_diff
        /usr/local/sbin/crm_error
        /usr/local/sbin/crm_failcount
        /usr/local/sbin/crm_master
        /usr/local/sbin/crm_mon
        /usr/local/sbin/crm_node
        /usr/local/sbin/crm_report
        /usr/local/sbin/crm_resource
        /usr/local/sbin/crm_rule
        /usr/local/sbin/crm_shadow
        /usr/local/sbin/crm_simulate
        /usr/local/sbin/crm_standby
        /usr/local/sbin/crm_ticket
        /usr/local/sbin/crm_verify
        /usr/local/sbin/crmadmin
        /usr/local/sbin/fence_legacy
        /usr/local/sbin/iso8601
        /usr/local/sbin/pacemaker-remoted
        /usr/local/sbin/pacemaker_remoted
        /usr/local/sbin/pacemakerd
        /usr/local/sbin/stonith_admin

root@node1:~ # pkg info -l corosync2 | grep bin
        /usr/local/bin/corosync-blackbox
        /usr/local/sbin/corosync
        /usr/local/sbin/corosync-cfgtool
        /usr/local/sbin/corosync-cmapctl
        /usr/local/sbin/corosync-cpgtool
        /usr/local/sbin/corosync-keygen
        /usr/local/sbin/corosync-notifyd
        /usr/local/sbin/corosync-quorumtool

root@node1:~ # pkg info -l crmsh | grep bin
        /usr/local/bin/crm

Cluster Initialization

Now we will initialize our FreeBSD cluster.

First we need to make sure that names of the nodes are DNS resolvable.

root@node1:~ # tail -3 /etc/hosts

10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3

root@node2:~ # tail -3 /etc/hosts

10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3

root@node3:~ # tail -3 /etc/hosts

10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3


Now we will generate the Corosync key.

root@node1:~ # corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /usr/local/etc/corosync/authkey.

root@node1:~ # echo $?
0

root@node1:~ # ls -l /usr/local/etc/corosync/authkey
-r--------  1 root  wheel  128 Sep  2 20:37 /usr/local/etc/corosync/authkey

Now the Corosync configuration file. For sure some examples were provided by the package maintainer.

root@node1:~ # pkg info -l corosync2 | grep example
        /usr/local/etc/corosync/corosync.conf.example
        /usr/local/etc/corosync/corosync.conf.example.udpu

We will take the second one as a base for our config.

root@node1:~ # cp /usr/local/etc/corosync/corosync.conf.example.udpu /usr/local/etc/corosync/corosync.conf

root@node1:~ # vi /usr/local/etc/corosync/corosync.conf
               /* LOTS OF EDITS HERE */

root@node1:~ # cat /usr/local/etc/corosync/corosync.conf

totem {
  version: 2
  crypto_cipher: aes256
  crypto_hash: sha256
  transport: udpu

  interface {
    ringnumber: 0
    bindnetaddr: 10.0.10.0
    mcastport: 5405
    ttl: 1
  }
}

logging {
  fileline: off
  to_logfile: yes
  to_syslog: no
  logfile: /var/log/cluster/corosync.log
  debug: off
  timestamp: on

  logger_subsys {
    subsys: QUORUM
    debug: off
  }
}

nodelist {

  node {
    ring0_addr: 10.0.10.111
    nodeid: 1
  }

  node {
    ring0_addr: 10.0.10.112
    nodeid: 2
  }

  node {
    ring0_addr: 10.0.10.113
    nodeid: 3
  }

}

quorum {
  provider: corosync_votequorum
  expected_votes: 2
}

Now we need to propagate both Corosync key and config across the nodes in the cluster.

We can use some simple tools created exactly for that like net/csync2 cluster synchronization tool for example but plain old net/rsync will serve as well.

root@node1:~ # pkg install -y rsync

root@node1:~ # rsync -av /usr/local/etc/corosync/ node2:/usr/local/etc/corosync/
The authenticity of host 'node2 (10.0.10.112)' can't be established.
ECDSA key fingerprint is SHA256:/ZDmln7GKi6n0kbad73TIrajPjGfQqJJX+ReSf3NMvc.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2' (ECDSA) to the list of known hosts.
Password for root@node2:
sending incremental file list
./
authkey
corosync.conf
service.d/
uidgid.d/

sent 1,100 bytes  received 69 bytes  259.78 bytes/sec
total size is 4,398  speedup is 3.76

root@node1:~ # rsync -av /usr/local/etc/corosync/ node3:/usr/local/etc/corosync/
The authenticity of host 'node2 (10.0.10.112)' can't be established.
ECDSA key fingerprint is SHA256:/ZDmln7GKi6n0kbad73TIrajPjGfQqJJX+ReSf3NMvc.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3' (ECDSA) to the list of known hosts.
Password for root@node3:
sending incremental file list
./
authkey
corosync.conf
service.d/
uidgid.d/

sent 1,100 bytes  received 69 bytes  259.78 bytes/sec
total size is 4,398  speedup is 3.76

Now lets check that they are the same.

root@node1:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf

root@node2:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf

root@node3:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf

Same.

We can now add corosync_enable=YES and pacemaker_enable=YES to the /etc/rc.conf file.

root@node1:~ # sysrc corosync_enable=YES
corosync_enable:  -> YES

root@node1:~ # sysrc pacemaker_enable=YES
pacemaker_enable:  -> YES

root@node2:~ # sysrc corosync_enable=YES
corosync_enable:  -> YES

root@node2:~ # sysrc pacemaker_enable=YES
pacemaker_enable:  -> YES

root@node3:~ # sysrc corosync_enable=YES
corosync_enable:  -> YES

root@node3:~ # sysrc pacemaker_enable=YES
pacemaker_enable:  -> YES

Lets start these services then.

root@node1:~ # service corosync start
Starting corosync.
Sep 02 20:55:35 notice  [MAIN  ] Corosync Cluster Engine ('2.4.5'): started and ready to provide service.
Sep 02 20:55:35 info    [MAIN  ] Corosync built-in features:
Sep 02 20:55:35 warning [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Sep 02 20:55:35 warning [MAIN  ] Please migrate config file to nodelist.

root@node1:~ # ps aux | grep corosync
root  1695   0.0  7.9 38340 38516  -  S    20:55    0:00.40 /usr/local/sbin/corosync
root  1699   0.0  0.1   524   336  0  R+   20:57    0:00.00 grep corosync

Do the same on the node2 and node3 systems.

The Pacemaker is not yet running so that will fail.

root@node1:~ # crm status
Could not connect to the CIB: Socket is not connected
crm_mon: Error: cluster is not available on this node
ERROR: status: crm_mon (rc=102): 

We will start it now.

root@node1:~ # service pacemaker start
Starting pacemaker.

root@node2:~ # service pacemaker start
Starting pacemaker.

root@node3:~ # service pacemaker start
Starting pacemaker.

You need to give it little time to start because if you will execute crm status command right away you will get 0 nodes configured message as shown below.

root@node1:~ # crm status
Cluster Summary:
  * Stack: unknown
  * Current DC: NONE
  * Last updated: Wed Sep  2 20:58:51 2020
  * Last change:  
  * 0 nodes configured
  * 0 resource instances configured


Full List of Resources:
  * No resources

… but after a while everything is detected and works as desired.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 21:02:49 2020
  * Last change:  Wed Sep  2 20:59:00 2020 by hacluster via crmd on node2
  * 3 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * No resources

The Pacemaker runs properly.

root@node1:~ # ps aux | grep pacemaker
root      1716   0.0  0.5 10844   2396  -  Is   20:58     0:00.00 daemon: /usr/local/sbin/pacemakerd[1717] (daemon)
root      1717   0.0  5.2 49264  25284  -  S    20:58     0:00.27 /usr/local/sbin/pacemakerd
hacluster 1718   0.0  6.1 48736  29708  -  Ss   20:58     0:00.75 /usr/local/libexec/pacemaker/pacemaker-based
root      1719   0.0  4.5 40628  21984  -  Ss   20:58     0:00.28 /usr/local/libexec/pacemaker/pacemaker-fenced
root      1720   0.0  2.8 25204  13688  -  Ss   20:58     0:00.20 /usr/local/libexec/pacemaker/pacemaker-execd
hacluster 1721   0.0  3.9 38148  19100  -  Ss   20:58     0:00.25 /usr/local/libexec/pacemaker/pacemaker-attrd
hacluster 1722   0.0  2.9 25460  13864  -  Ss   20:58     0:00.17 /usr/local/libexec/pacemaker/pacemaker-schedulerd
hacluster 1723   0.0  5.4 49304  26300  -  Ss   20:58     0:00.41 /usr/local/libexec/pacemaker/pacemaker-controld
root      1889   0.0  0.6 11348   2728  0  S+   21:56     0:00.00 grep pacemaker

We can check how Corosync sees its members.

root@node1:~ # corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.10.111) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.10.112) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(10.0.10.113) 
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.3.status (str) = joined

… or the quorum information.

root@node1:~ # corosync-quorumtool
Quorum information
------------------
Date:             Wed Sep  2 21:00:38 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1
Ring ID:          1/12
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate 

Membership information
----------------------
    Nodeid      Votes Name
         1          1 10.0.10.111 (local)
         2          1 10.0.10.112
         3          1 10.0.10.113

The Corosync log file is filled with the following information.

root@node1:~ # cat /var/log/cluster/corosync.log
Sep 02 20:55:35 [1694] node1 corosync notice  [MAIN  ] Corosync Cluster Engine ('2.4.5'): started and ready to provide service.
Sep 02 20:55:35 [1694] node1 corosync info    [MAIN  ] Corosync built-in features:
Sep 02 20:55:35 [1694] node1 corosync warning [MAIN  ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Sep 02 20:55:35 [1694] node1 corosync warning [MAIN  ] Please migrate config file to nodelist.
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] Initializing transport (UDP/IP Unicast).
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha256
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] The network interface [10.0.10.111] is now up.
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: cmap
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync configuration service [1]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: cfg
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: cpg
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
Sep 02 20:55:35 [1694] node1 corosync notice  [QUORUM] Using quorum provider corosync_votequorum
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: votequorum
Sep 02 20:55:35 [1694] node1 corosync notice  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 02 20:55:35 [1694] node1 corosync info    [QB    ] server name: quorum
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] adding new UDPU member {10.0.10.111}
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] adding new UDPU member {10.0.10.112}
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] adding new UDPU member {10.0.10.113}
Sep 02 20:55:35 [1694] node1 corosync notice  [TOTEM ] A new membership (10.0.10.111:4) was formed. Members joined: 1
Sep 02 20:55:35 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:55:35 [1694] node1 corosync notice  [QUORUM] Members[1]: 1
Sep 02 20:55:35 [1694] node1 corosync notice  [MAIN  ] Completed service synchronization, ready to provide service.
Sep 02 20:58:14 [1694] node1 corosync notice  [TOTEM ] A new membership (10.0.10.111:8) was formed. Members joined: 2
Sep 02 20:58:14 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:14 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:14 [1694] node1 corosync notice  [QUORUM] This node is within the primary component and will provide service.
Sep 02 20:58:14 [1694] node1 corosync notice  [QUORUM] Members[2]: 1 2
Sep 02 20:58:14 [1694] node1 corosync notice  [MAIN  ] Completed service synchronization, ready to provide service.
Sep 02 20:58:19 [1694] node1 corosync notice  [TOTEM ] A new membership (10.0.10.111:12) was formed. Members joined: 3
Sep 02 20:58:19 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync warning [CPG   ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync notice  [QUORUM] Members[3]: 1 2 3
Sep 02 20:58:19 [1694] node1 corosync notice  [MAIN  ] Completed service synchronization, ready to provide service.

Here is the configuration.

root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.4-2deceaa3ae \
        cluster-infrastructure=corosync

As we will not be configuring the STONITH mechanism we will disable it.

root@node1:~ # crm configure property stonith-enabled=false

New configuraion with STONITH disabled.

root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.4-2deceaa3ae \
        cluster-infrastructure=corosync \
        stonith-enabled=false

The STONITH configuration is out of scope of this article but properly configured STONITH looks like that.

stonith

First Service

We will now configure our first highly available service – a classic – a floating IP address πŸ™‚

root@node1:~ # crm configure primitive IP ocf:heartbeat:IPaddr2 params ip=10.0.10.200 cidr_netmask="24" op monitor interval="30s"

Lets check how it behaves.

root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
primitive IP IPaddr2 \
        params ip=10.0.10.200 cidr_netmask=24 \
        op monitor interval=30s
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.4-2deceaa3ae \
        cluster-infrastructure=corosync \
        stonith-enabled=false

Looks good – lets check the cluster status.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:03:35 2020
  * Last change:  Wed Sep  2 22:02:53 2020 by root via cibadmin on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::heartbeat:IPaddr2):        Stopped

Failed Resource Actions:
  * IP_monitor_0 on node3 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:53Z', queued=0ms, exec=132ms
  * IP_monitor_0 on node2 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:54Z', queued=0ms, exec=120ms
  * IP_monitor_0 on node1 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:53Z', queued=0ms, exec=110ms

Crap. Linuxism. The ip(8) command is expected to be present in the system. This is FreeBSD and as any UNIX system it comes with ifconfig(8) command instead.

We will have to figure something else. For now we will delete our useless IP service.

root@node1:~ # crm configure delete IP

Status after deletion.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:04:34 2020
  * Last change:  Wed Sep  2 22:04:31 2020 by root via cibadmin on node1
  * 3 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * No resources

Custom Resource

Lets check what resources are available by stock Pacemaker installation.

root@node1:~ # ls -l /usr/local/lib/ocf/resource.d/pacemaker
total 144
-r-xr-xr-x  1 root  wheel   7484 Aug 29 01:22 ClusterMon
-r-xr-xr-x  1 root  wheel   9432 Aug 29 01:22 Dummy
-r-xr-xr-x  1 root  wheel   5256 Aug 29 01:22 HealthCPU
-r-xr-xr-x  1 root  wheel   5342 Aug 29 01:22 HealthIOWait
-r-xr-xr-x  1 root  wheel   9450 Aug 29 01:22 HealthSMART
-r-xr-xr-x  1 root  wheel   6186 Aug 29 01:22 Stateful
-r-xr-xr-x  1 root  wheel  11370 Aug 29 01:22 SysInfo
-r-xr-xr-x  1 root  wheel   5856 Aug 29 01:22 SystemHealth
-r-xr-xr-x  1 root  wheel   7382 Aug 29 01:22 attribute
-r-xr-xr-x  1 root  wheel   7854 Aug 29 01:22 controld
-r-xr-xr-x  1 root  wheel  16134 Aug 29 01:22 ifspeed
-r-xr-xr-x  1 root  wheel  11040 Aug 29 01:22 o2cb
-r-xr-xr-x  1 root  wheel  11696 Aug 29 01:22 ping
-r-xr-xr-x  1 root  wheel   6356 Aug 29 01:22 pingd
-r-xr-xr-x  1 root  wheel   3702 Aug 29 01:22 remote

Not many … we will try to modify the Dummy service into an IP changer on FreeBSD.

root@node1:~ # cp /usr/local/lib/ocf/resource.d/pacemaker/Dummy /usr/local/lib/ocf/resource.d/pacemaker/ifconfig

root@node1:~ # vi /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
               /* LOTS OF TYPING */

Because of the WordPress blogging system limitations I am forced to post this ifconfig resource as an image … but fear not – the text version is also available here – ifconfig.odt – for download.

Also the first version did not went that well …

root@node1:~ # setenv OCF_ROOT /usr/local/lib/ocf
root@node1:~ # ocf-tester -n resourcename /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
Beginning tests for /usr/local/lib/ocf/resource.d/pacemaker/ifconfig...
* rc=3: Your agent has too restrictive permissions: should be 755
-:1: parser error : Start tag expected, '<' not found
usage: /usr/local/lib/ocf/resource.d/pacemaker/ifconfig {start|stop|monitor}
^
* rc=1: Your agent produces meta-data which does not conform to ra-api-1.dtd
* rc=3: Your agent does not support the meta-data action
* rc=3: Your agent does not support the validate-all action
* rc=0: Monitoring a stopped resource should return 7
* rc=0: The initial probe for a stopped resource should return 7 or 5 even if all binaries are missing
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* Your agent does not support the reload action (optional)
Tests failed: /usr/local/lib/ocf/resource.d/pacemaker/ifconfig failed 9 tests

But after adding 755 mode to it and making several (hundred) changes it become usable.

root@node1:~ # vi /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
             /* LOTS OF NERVOUS TYPING */
root@node1:~ # chmod 755 /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
root@node1:~ # setenv OCF_ROOT /usr/local/lib/ocf
root@node1:~ # ocf-tester -n resourcename /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
Beginning tests for /usr/local/lib/ocf/resource.d/pacemaker/ifconfig...
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
/usr/local/lib/ocf/resource.d/pacemaker/ifconfig passed all tests

Looks usable.

The ifconfig resource. Its pretty limited and with hardcoded IP address as for now.

ifconfig

Lets try to add new IP resource to our FreeBSD cluster.

Tests

root@node1:~ # crm configure primitive IP ocf:pacemaker:ifconfig op monitor interval="30"

Added.

Lets see what status command now shows.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:44:52 2020
  * Last change:  Wed Sep  2 22:44:44 2020 by root via cibadmin on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node1

Failed Resource Actions:
  * IP_monitor_0 on node3 'not installed' (5): call=24, status='Not installed', exitreason='', last-rc-change='2020-09-02 22:42:52Z', queued=0ms, exec=5ms
  * IP_monitor_0 on node2 'not installed' (5): call=24, status='Not installed', exitreason='', last-rc-change='2020-09-02 22:42:53Z', queued=0ms, exec=2ms

Crap. I forgot to copy this new ifconfig resource to the other nodes. Lets fix that now.

root@node1:~ # rsync -av /usr/local/lib/ocf/resource.d/pacemaker/ node2:/usr/local/lib/ocf/resource.d/pacemaker/
Password for root@node2:
sending incremental file list
./
ifconfig

sent 3,798 bytes  received 38 bytes  1,534.40 bytes/sec
total size is 128,003  speedup is 33.37

root@node1:~ # rsync -av /usr/local/lib/ocf/resource.d/pacemaker/ node3:/usr/local/lib/ocf/resource.d/pacemaker/
Password for root@node3:
sending incremental file list
./
ifconfig

sent 3,798 bytes  received 38 bytes  1,534.40 bytes/sec
total size is 128,003  speedup is 33.37

Lets stop, delete and re-add our precious resource now.

root@node1:~ # crm resource stop IP
root@node1:~ # crm configure delete IP
root@node1:~ # crm configure primitive IP ocf:pacemaker:ifconfig op monitor interval="30"

Fingers crossed.

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:45:46 2020
  * Last change:  Wed Sep  2 22:45:43 2020 by root via cibadmin on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node1

Looks like running properly.

Lets verify that its really up where it should be.

root@node1:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:2a:78:60
        inet 10.0.10.111 netmask 0xffffff00 broadcast 10.0.10.255
        inet 10.0.10.200 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

root@node2:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:80:50:05
        inet 10.0.10.112 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

root@node3:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:74:5e:b9
        inet 10.0.10.113 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

Seems to be working.

Now lets try to move it to the other node in the cluster.

root@node1:~ # crm resource move IP node3
INFO: Move constraint created for IP to node3

root@node1:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:47:31 2020
  * Last change:  Wed Sep  2 22:47:28 2020 by root via crm_resource on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node3

Switched properly to node3 system.

root@node3:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:74:5e:b9
        inet 10.0.10.113 netmask 0xffffff00 broadcast 10.0.10.255
        inet 10.0.10.200 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

root@node1:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
        options=81009b
        ether 08:00:27:2a:78:60
        inet 10.0.10.111 netmask 0xffffff00 broadcast 10.0.10.255
        media: Ethernet autoselect (1000baseT )
        status: active
        nd6 options=29

Now we will poweroff the node3 system to check it that IP is really highly available.

root@node2:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:49:57 2020
  * Last change:  Wed Sep  2 22:47:29 2020 by root via crm_resource on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node3

root@node3:~ # poweroff

root@node2:~ # crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
  * Last updated: Wed Sep  2 22:50:16 2020
  * Last change:  Wed Sep  2 22:47:29 2020 by root via crm_resource on node1
  * 3 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ node1 node2 ]
  * OFFLINE: [ node3 ]

Full List of Resources:
  * IP  (ocf::pacemaker:ifconfig):       Started node1

Seems that failover went well.

The crm command also colors various sections of its output.

failover

Good to know that Pacemaker and Corosync cluster runs well on FreeBSD.

Some work is needed to write the needed resource files but one with some time and determination can surely put FreeBSD into a very capable highly available cluster.

EOF

5 thoughts on “FreeBSD Cluster with Pacemaker and Corosync

  1. Pingback: Valuable News – 2020/09/07 | πšŸπšŽπš›πš–πšŠπšπšŽπš—

  2. Rui Vieira

    Great work, as always!
    I’d really like to suggest you to verify using ocf:heartbeat:IPaddr instead ocf:heartbeat:IPaddr2. The second, as you’ve noticed, uses “ip” in order to configure the interfaces, and the former uses the well-known “ifconfig” – so, it should work without any hassle.
    Even though – anyone can learn by your example how a script can be done from scratch to manage other resources and be integrated to pacemaker.

    Liked by 1 person

    Reply

Leave a comment