GlusterFS Cluster on FreeBSD with Ansible and GNU Parallel

Today I would like to present an article about setting up GlusterFS cluster on a FreeBSD system with Ansible and GNU Parallel tools.

gluster-logo.png

To cite Wikipedia “GlusterFS is a scale-out network-attached storage file system. It has found applications including cloud computing, streaming media services, and content delivery networks.” The GlusterFS page describes it similarly “Gluster is a scalable, distributed file system that aggregates disk storage resources from multiple servers into a single global namespace.”

Here are its advantages:

  • Scales to several petabytes.
  • Handles thousands of clients.
  • POSIX compatible.
  • Uses commodity hardware.
  • Can use any ondisk filesystem that supports extended attributes.
  • Accessible using industry standard protocols like NFS and SMB.
  • Provides replication/quotas/geo-replication/snapshots/bitrot detection.
  • Allows optimization for different workloads.
  • Open Source.

Lab Setup

It will be entirely VirtualBox based and it will consist of 6 hosts. To not create 6 same FreeBSD installations I used 12.0-RELEASE virtual machine image available from the FreeBSD Project directly:

There are several formats available – qcow2/raw/vhd/vmdk – but as I will be using VirtualBox I used the VMDK one.

I will use different prompts depending on where the command is executed to make the article more readable. Also then there is ‘%‘ at the prompt then a regular user is needed and if there is ‘#‘ at the prompt then a superuser is needed.

gluster1 #    // command run on the gluster1 node
gluster* #    // command run on all gluster nodes
client #      // command run on gluster client
vbhost %      // command run on the VirtualBox host

Here is the list of the machines for the GlusterFS cluster:

10.0.10.11 gluster1
10.0.10.12 gluster2
10.0.10.13 gluster3
10.0.10.14 gluster4
10.0.10.15 gluster5
10.0.10.16 gluster6

Each VirtualBox virtual machine for FreeBSD is the default one (as suggested in the VirtualBox wizard) with 512 MB RAM and NAT Network as shown on the image below.

virtualbox-freebsd-gluster-host.jpg

Here is the configuration of the NAT Network on VirtualBox.

virtualbox-nat-network.jpg

The cloned/copied FreeBSD-12.0-RELEASE-amd64.vmdk image will need to have different UUIDs so we will use VBoxManage internalcommands sethduuid command to achieve this.

vbhost % for I in $( seq 6 ); do cp FreeBSD-12.0-RELEASE-amd64.vmdk    vbox_GlusterFS_${I}.vmdk; done
vbhost % for I in $( seq 6 ); do VBoxManage internalcommands sethduuid vbox_GlusterFS_${I}.vmdk; done

To start the whole GlusterFS environment on VirtualBox use these commands.

vbhost % VBoxManage list vms | grep GlusterFS
"FreeBSD GlusterFS 1" {162a3b6f-4ec9-4709-bff8-162b0c8c9c41}
"FreeBSD GlusterFS 2" {2e30326c-ac5d-41d2-9b28-483375df38f6}
"FreeBSD GlusterFS 3" {6b2747ab-3ec6-4b1a-a28e-5d871d7891b3}
"FreeBSD GlusterFS 4" {12379cf8-31d9-4ff1-9945-465fc3ed15f0}
"FreeBSD GlusterFS 5" {a4b0d515-5924-4517-9052-df238c366f2b}
"FreeBSD GlusterFS 6" {66621755-1b97-4486-aa15-a7bec9edb343}

Check which GlusterFS machines are running.

vbhost % VBoxManage list runningvms | grep GlusterFS
vbhost %

Starting of the machines in VirtualBox Headless mode in parallel.

vbhost % VBoxManage list vms \
           | grep GlusterFS \
           | awk -F \" '{print $2}' \
           | while read I; do VBoxManage startvm "${I}" --type headless & done

After that command you should see these machines running.

vbhost % VBoxManage list runningvms
"FreeBSD GlusterFS 1" {162a3b6f-4ec9-4709-bff8-162b0c8c9c41}
"FreeBSD GlusterFS 2" {2e30326c-ac5d-41d2-9b28-483375df38f6}
"FreeBSD GlusterFS 3" {6b2747ab-3ec6-4b1a-a28e-5d871d7891b3}
"FreeBSD GlusterFS 4" {12379cf8-31d9-4ff1-9945-465fc3ed15f0}
"FreeBSD GlusterFS 5" {a4b0d515-5924-4517-9052-df238c366f2b}
"FreeBSD GlusterFS 6" {66621755-1b97-4486-aa15-a7bec9edb343}

Before we will try connect to our FreeBSD machines we need to make the minimal network configuration. Each FreeBSD machine will have such minimal /etc/rc.conf file as shown example for gluster1 host.

gluster1 # cat /etc/rc.conf
hostname=gluster1
ifconfig_DEFAULT="inet 10.0.10.11/24 up"
defaultrouter=10.0.10.1
sshd_enable=YES

For the setup purposes we will need to allow root login on these FreeBSD GlusterFS machines with PermitRootLogin yes option in the /etc/ssh/sshd_config file. You will also need to restart the sshd(8) service after the changes.

gluster1 # grep '^PermitRootLogin' /etc/ssh/sshd_config
PermitRootLogin yes
# service sshd restart

By using NAT Network with Port Forwarding the FreeBSD machines will be accessible on the localhost ports. For example the gluster1 machine will be available on port 2211, the gluster2 machine will be available on port 2212 and so on. This is shown in the sockstat utility output below.

vbhost % sockstat -l4
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
vermaden VBoxNetNAT 57622 17 udp4   *:*                   *:*
vermaden VBoxNetNAT 57622 19 tcp4   *:2211                *:*
vermaden VBoxNetNAT 57622 20 tcp4   *:2212                *:*
vermaden VBoxNetNAT 57622 21 tcp4   *:2213                *:*
vermaden VBoxNetNAT 57622 22 tcp4   *:2214                *:*
vermaden VBoxNetNAT 57622 23 tcp4   *:2215                *:*
vermaden VBoxNetNAT 57622 24 tcp4   *:2216                *:*
vermaden VBoxNetNAT 57622 28 tcp4   *:2240                *:*
vermaden VBoxNetNAT 57622 29 tcp4   *:9140                *:*
vermaden VBoxNetNAT 57622 30 tcp4   *:2220                *:*
root     sshd       96791 4  tcp4   *:22                  *:*

I think the corelation between IP address and the port on the host is obvious πŸ™‚

Here is the list of the machines with ports on localhost:

10.0.10.11 gluster1 2211
10.0.10.12 gluster2 2212
10.0.10.13 gluster3 2213
10.0.10.14 gluster4 2214
10.0.10.15 gluster5 2215
10.0.10.16 gluster6 2216

To connect to such machine from the VirtualBox host system you will need this command:

vbhost % ssh -l root localhost -p 2211

To not type that every time you need to login to gluster1 let’s make come changes to ~/.ssh/config file for convenience. This way it will be possible to login in very short way.

vbhost % ssh gluster1

Here is the modified ~/.ssh/config file.

vbhost % cat ~/.ssh/config
# GENERAL
  StrictHostKeyChecking no
  LogLevel              quiet
  KeepAlive             yes
  ServerAliveInterval   30
  VerifyHostKeyDNS      no

# ALL HOSTS SETTINGS
Host *
  StrictHostKeyChecking no
  Compression           yes

# GLUSTER
Host gluster1
  User root
  Hostname 127.0.0.1
  Port 2211

Host gluster2
  User root
  Hostname 127.0.0.1
  Port 2212

Host gluster3
  User root
  Hostname 127.0.0.1
  Port 2213

Host gluster4
  User root
  Hostname 127.0.0.1
  Port 2214

Host gluster5
  User root
  Hostname 127.0.0.1
  Port 2215

Host gluster6
  User root
  Hostname 127.0.0.1
  Port 2216

I assume that you already have some SSH keys generated (with ~/.ssh/id_rsa as private key) so lets remove the need to type password on each SSH login.

vbhost % ssh-copy-id -i ~/.ssh/id_rsa gluster1
Password for root@gluster1:

vbhost % ssh-copy-id -i ~/.ssh/id_rsa gluster2
Password for root@gluster2:

vbhost % ssh-copy-id -i ~/.ssh/id_rsa gluster3
Password for root@gluster3:

vbhost % ssh-copy-id -i ~/.ssh/id_rsa gluster4
Password for root@gluster4:

vbhost % ssh-copy-id -i ~/.ssh/id_rsa gluster5
Password for root@gluster5:

vbhost % ssh-copy-id -i ~/.ssh/id_rsa gluster6
Password for root@gluster6:

Ansible Setup

As we already have SSH integration now we will configure Ansible to connect to out ‘localhost’ ports for FreeBSD machines.

Here is the Ansible’s hosts file.

vbhost % cat hosts
[gluster]
gluster1 ansible_port=2211 ansible_host=127.0.0.1 ansible_user=root
gluster2 ansible_port=2212 ansible_host=127.0.0.1 ansible_user=root
gluster3 ansible_port=2213 ansible_host=127.0.0.1 ansible_user=root
gluster4 ansible_port=2214 ansible_host=127.0.0.1 ansible_user=root
gluster5 ansible_port=2215 ansible_host=127.0.0.1 ansible_user=root
gluster6 ansible_port=2216 ansible_host=127.0.0.1 ansible_user=root

[gluster:vars]
ansible_python_interpreter=/usr/local/bin/python2.7

Here is the listing of these machines using ansible command.

vbhost % ansible -i hosts --list-hosts gluster
  hosts (6):
    gluster1
    gluster2
    gluster3
    gluster4
    gluster5
    gluster6

Lets verify that out Ansible setup works correctly.

vbhost % ansible -i hosts -m raw -a 'echo' gluster
gluster1 | CHANGED | rc=0 >>



gluster3 | CHANGED | rc=0 >>



gluster2 | CHANGED | rc=0 >>



gluster5 | CHANGED | rc=0 >>



gluster4 | CHANGED | rc=0 >>



gluster6 | CHANGED | rc=0 >>

It works as desired.

We are not able to use Ansible modules other then Raw because by default Python is not installed on FreeBSD as shown below.

vbhost % ansible -i hosts -m ping gluster
gluster1 | FAILED! => {
    "changed": false,
    "module_stderr": "",
    "module_stdout": "/bin/sh: /usr/local/bin/python2.7: not found\r\n",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 127
}
gluster2 | FAILED! => {
    "changed": false,
    "module_stderr": "",
    "module_stdout": "/bin/sh: /usr/local/bin/python2.7: not found\r\n",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 127
}
gluster4 | FAILED! => {
    "changed": false,
    "module_stderr": "",
    "module_stdout": "/bin/sh: /usr/local/bin/python2.7: not found\r\n",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 127
}
gluster5 | FAILED! => {
    "changed": false,
    "module_stderr": "",
    "module_stdout": "/bin/sh: /usr/local/bin/python2.7: not found\r\n",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 127
}
gluster3 | FAILED! => {
    "changed": false,
    "module_stderr": "",
    "module_stdout": "/bin/sh: /usr/local/bin/python2.7: not found\r\n",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 127
}
gluster6 | FAILED! => {
    "changed": false,
    "module_stderr": "",
    "module_stdout": "/bin/sh: /usr/local/bin/python2.7: not found\r\n",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 127
}

We need to get Python installed on FreeBSD.

We will partially use Ansible for this and partially the GNU Parallel.

vbhost % ansible -i hosts --list-hosts gluster \
           | sed 1d \
           | while read I; do ssh ${I} env ASSUME_ALWAYS_YES=yes pkg install python; done
pkg: Error fetching http://pkg.FreeBSD.org/FreeBSD:12:amd64/quarterly/Latest/pkg.txz: No address record
A pre-built version of pkg could not be found for your system.
Consider changing PACKAGESITE or installing it from ports: 'ports-mgmt/pkg'.
Bootstrapping pkg from pkg+http://pkg.FreeBSD.org/FreeBSD:12:amd64/quarterly, please wait...

… we forgot about setting up DNS in the FreeBSD machines, let’s fix that.

It is as easy as executing echo nameserver 1.1.1.1 > /etc/resolv.conf command on each FreeBSD machine.

Lets verify what input will be sent to GNU Parallel before executing it.

vbhost % ansible -i hosts --list-hosts gluster \
           | sed 1d \
           | while read I; do echo "ssh ${I} 'echo nameserver 1.1.1.1 > /etc/resolv.conf'"; done
ssh gluster1 'echo nameserver 1.1.1.1 > /etc/resolv.conf'
ssh gluster2 'echo nameserver 1.1.1.1 > /etc/resolv.conf'
ssh gluster3 'echo nameserver 1.1.1.1 > /etc/resolv.conf'
ssh gluster4 'echo nameserver 1.1.1.1 > /etc/resolv.conf'
ssh gluster5 'echo nameserver 1.1.1.1 > /etc/resolv.conf'
ssh gluster6 'echo nameserver 1.1.1.1 > /etc/resolv.conf'

Looks reasonable, lets engage the GNU Parallel then.

vbhost % ansible -i hosts --list-hosts gluster \
           | sed 1d \
           | while read I; do echo "ssh ${I} 'echo nameserver 1.1.1.1 > /etc/resolv.conf'"; done | parallel

Computers / CPU cores / Max jobs to run
1:local / 2 / 2

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:0/6/100%/1.0s

We will now verify that the DNS is configured properly on the FreeBSD machines.

vbhost % for I in $( jot 6 ); do echo -n "gluster${I} "; ssh gluster${I} 'cat /etc/resolv.conf'; done
gluster1 nameserver 1.1.1.1
gluster2 nameserver 1.1.1.1
gluster3 nameserver 1.1.1.1
gluster4 nameserver 1.1.1.1
gluster5 nameserver 1.1.1.1
gluster6 nameserver 1.1.1.1

Verification of the DNS by using ping(8) to test Internet connectivity.

vbhost % for I in $( jot 6 ); do echo; echo "gluster${I}"; ssh gluster${I} host freebsd.org; done

gluster1
freebsd.org has address 96.47.72.84
freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
freebsd.org mail is handled by 10 mx1.freebsd.org.
freebsd.org mail is handled by 30 mx66.freebsd.org.

gluster2
freebsd.org has address 96.47.72.84
freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
freebsd.org mail is handled by 30 mx66.freebsd.org.
freebsd.org mail is handled by 10 mx1.freebsd.org.

gluster3
freebsd.org has address 96.47.72.84
freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
freebsd.org mail is handled by 30 mx66.freebsd.org.
freebsd.org mail is handled by 10 mx1.freebsd.org.

gluster4
freebsd.org has address 96.47.72.84
freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
freebsd.org mail is handled by 30 mx66.freebsd.org.
freebsd.org mail is handled by 10 mx1.freebsd.org.

gluster5
freebsd.org has address 96.47.72.84
freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
freebsd.org mail is handled by 10 mx1.freebsd.org.
freebsd.org mail is handled by 30 mx66.freebsd.org.

gluster6
freebsd.org has address 96.47.72.84
freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
freebsd.org mail is handled by 10 mx1.freebsd.org.
freebsd.org mail is handled by 30 mx66.freebsd.org.

The DNS resolution works properly, now we will switch from the default quarterly pkg(8) repository to the latest one which has more frequent updates as the name suggests. We will need to use sed -i '' s/quarterly/latest/g /etc/pkg/FreeBSD.conf command on each FreeBSD machine.

Verification what will be sent to GNU Parallel.

vbhost % ansible -i hosts --list-hosts gluster \
           | sed 1d \
           | while read I; do echo "ssh ${I} 'sed -i \"\" s/quarterly/latest/g /etc/pkg/FreeBSD.conf'"; done
ssh gluster1 'sed -i "" s/quarterly/latest/g /etc/pkg/FreeBSD.conf'
ssh gluster2 'sed -i "" s/quarterly/latest/g /etc/pkg/FreeBSD.conf'
ssh gluster3 'sed -i "" s/quarterly/latest/g /etc/pkg/FreeBSD.conf'
ssh gluster4 'sed -i "" s/quarterly/latest/g /etc/pkg/FreeBSD.conf'
ssh gluster5 'sed -i "" s/quarterly/latest/g /etc/pkg/FreeBSD.conf'
ssh gluster6 'sed -i "" s/quarterly/latest/g /etc/pkg/FreeBSD.conf'

Let’s send the command to FreeBSD machines then.

vbhost % ansible -i hosts --list-hosts gluster \
           | sed 1d \
           | while read I; do echo "ssh $I 'sed -i \"\" s/quarterly/latest/g /etc/pkg/FreeBSD.conf'"; done | parallel

Computers / CPU cores / Max jobs to run
1:local / 2 / 2

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:0/6/100%/1.0s

As shown below the latest repository is configured in the /etc/pkg/FreeBSD.conf file on each FreeBSD machine.

vbhost % ssh gluster3 tail -7 /etc/pkg/FreeBSD.conf
FreeBSD: {
  url: "pkg+http://pkg.FreeBSD.org/${ABI}/latest",
  mirror_type: "srv",
  signature_type: "fingerprints",
  fingerprints: "/usr/share/keys/pkg",
  enabled: yes
}

We may now get back to Python.

vbhost % ansible -i hosts --list-hosts gluster \
           | sed 1d \
           | while read I; do echo ssh ${I} env ASSUME_ALWAYS_YES=yes pkg install python; done
ssh gluster1 env ASSUME_ALWAYS_YES=yes pkg install python
ssh gluster2 env ASSUME_ALWAYS_YES=yes pkg install python
ssh gluster3 env ASSUME_ALWAYS_YES=yes pkg install python
ssh gluster4 env ASSUME_ALWAYS_YES=yes pkg install python
ssh gluster5 env ASSUME_ALWAYS_YES=yes pkg install python
ssh gluster6 env ASSUME_ALWAYS_YES=yes pkg install python

… and execution on the FreeBSD machines with GNU Parallel.

vbhost % ansible -i hosts --list-hosts gluster \ 
           | sed 1d \
           | while read I; do echo ssh ${I} env ASSUME_ALWAYS_YES=yes pkg install python; done | parallel

Computers / CPU cores / Max jobs to run
1:local / 2 / 2

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:0/6/100%/156.0s

The Python packages and its dependencies are installed.

vbhost % ssh gluster3 pkg info
gettext-runtime-0.19.8.1_2     GNU gettext runtime libraries and programs
indexinfo-0.3.1                Utility to regenerate the GNU info page index
libffi-3.2.1_3                 Foreign Function Interface
pkg-1.10.5_5                   Package manager
python-2.7_3,2                 "meta-port" for the default version of Python interpreter
python2-2_3                    The "meta-port" for version 2 of the Python interpreter
python27-2.7.15                Interpreted object-oriented programming language
readline-7.0.5                 Library for editing command lines as they are typed

Now with Ansible Ping module works as desired.

% ansible -i hosts -m ping gluster
gluster1 | SUCCESS => {
"changed": false,
"ping": "pong"
}
gluster4 | SUCCESS => {
"changed": false,
"ping": "pong"
}
gluster5 | SUCCESS => {
"changed": false,
"ping": "pong"
}
gluster3 | SUCCESS => {
"changed": false,
"ping": "pong"
}
gluster2 | SUCCESS => {
"changed": false,
"ping": "pong"
}
gluster6 | SUCCESS => {
"changed": false,
"ping": "pong"
}

GlusterFS Volume Options

GlusterFS has a lot of options to setup the volume. They are described in the GlusterFS Administration Guide in the Setting up GlusterFS Volumes part. Here they are:

Distributed – Distributed volumes distribute files across the bricks in the volume. You can use distributed volumes where the requirement is to scale storage and the redundancy is either not important or is provided by other hardware/software layers.

Replicated – Replicated volumes replicate files across bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.

Distributed Replicated – Distributed replicated volumes distribute files across replicated bricks in the volume. You can use distributed replicated volumes in environments where the requirement is to scale storage and high-reliability is critical. Distributed replicated volumes also offer improved read performance in most environments.

Dispersed – Dispersed volumes are based on erasure codes, providing space-efficient protection against disk or server failures. It stores an encoded fragment of the original file to each brick in a way that only a subset of the fragments is needed to recover the original file. The number of bricks that can be missing without losing access to data is configured by the administrator on volume creation time.

Distributed Dispersed – Distributed dispersed volumes distribute files across dispersed subvolumes. This has the same advantages of distribute replicate volumes, but using disperse to store the data into the bricks.

Striped [Deprecated] – Striped volumes stripes data across bricks in the volume. For best results, you should use striped volumes only in high concurrency environments accessing very large files.

Distributed Striped [Deprecated] – Distributed striped volumes stripe data across two or more nodes in the cluster. You should use distributed striped volumes where the requirement is to scale storage and in high concurrency environments accessing very large files is critical.

Distributed Striped Replicated [Deprecated] – Distributed striped replicated volumes distributes striped data across replicated bricks in the cluster. For best results, you should use distributed striped replicated volumes in highly concurrent environments where parallel access of very large files and performance is critical. In this release, configuration of this volume type is supported only for Map Reduce workloads.

Striped Replicated [Deprecated] – Striped replicated volumes stripes data across replicated bricks in the cluster. For best results, you should use striped replicated volumes in highly concurrent environments where there is parallel access of very large files and performance is critical. In this release, configuration of this volume type is supported only for Map Reduce workloads.

From all of the above still supported the Dispersed volume seems to be the best choice. Like Minio Dispersed volumes are based on erasure codes.

As we have 6 servers we will use 4 + 2 setup which is logical RAID6 against these 6 servers. This means that we will be able to lost 2 of them without service outage. This also means that if we will upload 100 MB file to our volume we will use 150 MB of space across these 6 servers with 25 MB on each node.

We can visualize this as following ASCII diagram.

+-----------+ +-----------+ +-----------+ +-----------+ +-----------+ +-----------+
|  gluster1 | |  gluster2 | |  gluster3 | |  gluster4 | |  gluster5 | |  gluster6 |
|           | |           | |           | |           | |           | |           |
|    brick1 | |    brick2 | |    brick3 | |    brick4 | |    brick5 | |    brick6 |
+-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+
      |             |             |             |             |             |
    25|MB         25|MB         25|MB         25|MB         25|MB         25|MB
      |             |             |             |             |             |
      +-------------+-------------+------+------+-------------+-------------+
                                         |
                                      100|MB
                                         |
                                     +---+---+
                                     | file0 |
                                     +-------+

Deploy GlusterFS Cluster

We will use gluster-setup.yml as our Ansible playbook.

Lets create something for the start, for example to always install the latest Python package.

vbhost % cat gluster-setup.yml
---
- name: Install and Setup GlusterFS on FreeBSD
  hosts: gluster
  user: root
  tasks:

  - name: Install Latest Python Package
    pkgng:
      name: python
      state: latest

We will now execute it.

vbhost % ansible-playbook -i hosts gluster-setup.yml

PLAY [Install and Setup GlusterFS on FreeBSD] **********************************

TASK [Gathering Facts] *********************************************************
ok: [gluster3]
ok: [gluster5]
ok: [gluster1]
ok: [gluster4]
ok: [gluster2]
ok: [gluster6]

TASK [Install Latest Python Package] *******************************************
ok: [gluster4]
ok: [gluster2]
ok: [gluster5]
ok: [gluster3]
ok: [gluster1]
ok: [gluster6]

PLAY RECAP *********************************************************************
gluster1                   : ok=2    changed=0    unreachable=0    failed=0
gluster2                   : ok=2    changed=0    unreachable=0    failed=0
gluster3                   : ok=2    changed=0    unreachable=0    failed=0
gluster4                   : ok=2    changed=0    unreachable=0    failed=0
gluster5                   : ok=2    changed=0    unreachable=0    failed=0
gluster6                   : ok=2    changed=0    unreachable=0    failed=0

We just installed Python on these machines no update was needed.

As we will be creating cluster we need to add time synchronization between the nodes of the cluster. We will use mose obvious solution – the ntpd(8) daemon that is in the FreeBSD base system. These lines are added to our gluster-setup.yml playbook to achieve this goal

  - name: Enable NTPD Service
    raw: sysrc ntpd_enable=YES

  - name: Start NTPD Service
    service:
      name: ntpd
      state: started

After executing the playbook again with the ansible-playbook -i hosts gluster-setup.yml command we will see additional output as the one shown below.

TASK [Enable NTPD Service] ************************************************
changed: [gluster2]
changed: [gluster1]
changed: [gluster4]
changed: [gluster5]
changed: [gluster3]
changed: [gluster6]

TASK [Start NTPD Service] ******************************************************
changed: [gluster5]
changed: [gluster4]
changed: [gluster2]
changed: [gluster1]
changed: [gluster3]
changed: [gluster6]

Random verification of the NTP service.

vbhost % ssh gluster1 ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 0.freebsd.pool. .POOL.          16 p    -   64    0    0.000    0.000   0.000
 ntp.ifj.edu.pl  10.0.2.4         3 u    1   64    1  119.956  -345759  32.552
 news-archive.ic 229.30.220.210   2 u    -   64    1   60.533  -345760  21.104

Now we need to install GlusterFS on FreeBSD machines – the glusterfs package.

We will add appropriate section to the playbook.

  - name: Install Latest GlusterFS Package
    pkgng:
      state: latest
      name:
      - glusterfs
      - ncdu

You can add more then one package to the pkgng Ansible module – for example I have also added ncdu package.

You can read more about pkgng Ansible module by typing the ansible-doc pkgng command or at least its short version with -s argument.

vbhost % ansible-doc -s pkgng
- name: Package manager for FreeBSD >= 9.0
  pkgng:
      annotation:            # A comma-separated list of keyvalue-pairs of the form `[=]'. A `+' denotes adding
                               an annotation, a `-' denotes removing an annotation, and `:' denotes
                               modifying an annotation. If setting or modifying annotations, a value
                               must be provided.
      autoremove:            # Remove automatically installed packages which are no longer needed.
      cached:                # Use local package base instead of fetching an updated one.
      chroot:                # Pkg will chroot in the specified environment. Can not be used together with `rootdir' or `jail'
                               options.
      jail:                  # Pkg will execute in the given jail name or id. Can not be used together with `chroot' or `rootdir'
                               options.
      name:                  # (required) Name or list of names of packages to install/remove.
      pkgsite:               # For pkgng versions before 1.1.4, specify packagesite to use for downloading packages. If not
                               specified, use settings from `/usr/local/etc/pkg.conf'. For newer
                               pkgng versions, specify a the name of a repository configured in
                               `/usr/local/etc/pkg/repos'.
      rootdir:               # For pkgng versions 1.5 and later, pkg will install all packages within the specified root directory.
                               Can not be used together with `chroot' or `jail' options.
      state:                 # State of the package. Note: "latest" added in 2.7

You can read more about this particular module on the following – https://docs.ansible.com/ansible/latest/modules/pkgng_module.html – Ansible page.

We will now add GlusterFS nodes to the /etc/hosts file and add autoboot_delay=1 parameter to the /boot/loader.conf file so our systems will boot 9 seconds faster as 10 is the default delay setting.

Here is out gluster-setup.yml Ansible playbook this far.

vbhost % cat gluster-setup.yml
---
- name: Install and Setup GlusterFS on FreeBSD
  hosts: gluster
  user: root
  tasks:

  - name: Install Latest Python Package
    pkgng:
      name: python
      state: latest

  - name: Enable NTPD Service
    raw: sysrc ntpd_enable=YES

  - name: Start NTPD Service
    service:
      name: ntpd
      state: started

  - name: Install Latest GlusterFS Package
    pkgng:
      state: latest
      name:
      - glusterfs
      - ncdu

  - name: Add Nodes to /etc/hosts File
    blockinfile:
      path: /etc/hosts
      block: |
        10.0.10.11 gluster1
        10.0.10.12 gluster2
        10.0.10.13 gluster3
        10.0.10.14 gluster4
        10.0.10.15 gluster5
        10.0.10.16 gluster6

  - name: Add autoboot_delay to /boot/loader.conf File
    lineinfile:
      path: /boot/loader.conf
      line: autoboot_delay=1
      create: yes

Here is the result of the execution of this playbook.

vbhost % ansible-playbook -i hosts gluster-setup.yml

PLAY [Install and Setup GlusterFS on FreeBSD] **********************************

TASK [Gathering Facts] *********************************************************
ok: [gluster3]
ok: [gluster5]
ok: [gluster1]
ok: [gluster4]
ok: [gluster2]
ok: [gluster6]

TASK [Install Latest Python Package] *******************************************
ok: [gluster4]
ok: [gluster2]
ok: [gluster5]
ok: [gluster3]
ok: [gluster1]
ok: [gluster6]

TASK [Install Latest GlusterFS Package] ****************************************
ok: [gluster2]
ok: [gluster1]
ok: [gluster3]
ok: [gluster5]
ok: [gluster4]
ok: [gluster6]

TASK [Add Nodes to /etc/hosts File] ********************************************
changed: [gluster5]
changed: [gluster4]
changed: [gluster2]
changed: [gluster3]
changed: [gluster1]
changed: [gluster6]

TASK [Enable GlusterFS Service] ************************************************
changed: [gluster1]
changed: [gluster4]
changed: [gluster2]
changed: [gluster3]
changed: [gluster5]
changed: [gluster6]

TASK [Add autoboot_delay to /boot/loader.conf File] ****************************
changed: [gluster3]
changed: [gluster2]
changed: [gluster5]
changed: [gluster1]
changed: [gluster4]
changed: [gluster6]

PLAY RECAP *********************************************************************
gluster1                   : ok=6    changed=3    unreachable=0    failed=0
gluster2                   : ok=6    changed=3    unreachable=0    failed=0
gluster3                   : ok=6    changed=3    unreachable=0    failed=0
gluster4                   : ok=6    changed=3    unreachable=0    failed=0
gluster5                   : ok=6    changed=3    unreachable=0    failed=0
gluster6                   : ok=6    changed=3    unreachable=0    failed=0

Let’s check that FreeBSD machines can now ping each other by names.

vbhost % ssh gluster6 cat /etc/hosts
# LOOPBACK
127.0.0.1      localhost localhost.my.domain
::1            localhost localhost.my.domain

# BEGIN ANSIBLE MANAGED BLOCK
10.0.10.11 gluster1
10.0.10.12 gluster2
10.0.10.13 gluster3
10.0.10.14 gluster4
10.0.10.15 gluster5
10.0.10.16 gluster6
# END ANSIBLE MANAGED BLOCK

vbhost % ssh gluster1 ping -c 1 gluster3
PING gluster3 (10.0.10.13): 56 data bytes
64 bytes from 10.0.10.13: icmp_seq=0 ttl=64 time=1.924 ms

--- gluster3 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.924/1.924/1.924/0.000 ms

… and our /boot/loader.conf file.

vbhost % ssh gluster4 cat /boot/loader.conf
autoboot_delay=1

Now we need to create directories for GlusterFS data. Without better idea we will use /data directory with /data/colume1 as the directory for volume1 and bricks will be put as /data/volume1/brick1 dirs. In this setup I will use just one brick per server but in production environment you would probably use one brick per physical disk.

Here is the playbook command we will use to create these directories on FreeBSD machines.

  - name: Create brick* Directories for volume1
    raw: mkdir -p /data/volume1/brick` hostname | grep -o -E '[0-9]+' `

After executing it with ansible-playbook -i hosts gluster-setup.yml command the directories has beed created.

vbhost % ssh gluster2 find /data -ls | column -t
2247168  8  drwxr-xr-x  3  root  wheel  512  Dec  28  17:48  /data
2247169  8  drwxr-xr-x  3  root  wheel  512  Dec  28  17:48  /data/volume2
2247170  8  drwxr-xr-x  2  root  wheel  512  Dec  28  17:48  /data/volume2/brick2


We now need to add glusterd_enable=YES to the /etc/rc.conf file on GlusterFS nodes and then start the GlsuterFS service.

This is the snippet we will add to our playbook.

  - name: Enable GlusterFS Service
    raw: sysrc glusterd_enable=YES

  - name: Start GlusterFS Service
    service:
      name: glusterd
      state: started

Let’s make quick random verification.

vbhost % ssh gluster4 service glusterd status
glusterd is running as pid 2684.

Now we need to proceed to the last part of the GlusterFS setup – create the volume.

We will do this from the gluster1 – the 1st node of the GlusterFS cluster.

First we need to peer probe other nodes.

gluster1 # gluster peer probe gluster1
peer probe: success. Probe on localhost not needed
gluster1 # gluster peer probe gluster2
peer probe: success.
gluster1 # gluster peer probe gluster3
peer probe: success.
gluster1 # gluster peer probe gluster4
peer probe: success.
gluster1 # gluster peer probe gluster5
peer probe: success.
gluster1 # gluster peer probe gluster6
peer probe: success.

Then we can create the volume. We will need to use force option to because for our example setup we will use directories on the root partition.

gluster1 # gluster volume create volume1 \
             disperse-data 4 \
             redundancy 2 \
             transport tcp \
             gluster1:/data/volume1/brick1 \
             gluster2:/data/volume1/brick2 \
             gluster3:/data/volume1/brick3 \
             gluster4:/data/volume1/brick4 \
             gluster5:/data/volume1/brick5 \
             gluster6:/data/volume1/brick6 \
             force
volume create: volume1: success: please start the volume to access data

We can now start the volume1 GlsuerFS volume.

gluster1 # gluster volume start volume1
volume start: volume1: success

gluster1 # gluster volume status volume1
Status of volume: volume1
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1:/data/volume1/brick1         N/A       N/A        N       N/A
Brick gluster2:/data/volume1/brick2         N/A       N/A        N       N/A
Brick gluster3:/data/volume1/brick3         N/A       N/A        N       N/A
Brick gluster4:/data/volume1/brick4         N/A       N/A        N       N/A
Brick gluster5:/data/volume1/brick5         N/A       N/A        N       N/A
Brick gluster6:/data/volume1/brick6         N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        N       644
Self-heal Daemon on gluster6                N/A       N/A        N       643
Self-heal Daemon on gluster5                N/A       N/A        N       647
Self-heal Daemon on gluster2                N/A       N/A        N       645
Self-heal Daemon on gluster3                N/A       N/A        N       645
Self-heal Daemon on gluster4                N/A       N/A        N       645

Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks

gluster1 # gluster volume info volume1

Volume Name: volume1
Type: Disperse
Volume ID: 68cf9607-16bc-4550-9b6b-16a5c7656f51
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: gluster1:/data/volume1/brick1
Brick2: gluster2:/data/volume1/brick2
Brick3: gluster3:/data/volume1/brick3
Brick4: gluster4:/data/volume1/brick4
Brick5: gluster5:/data/volume1/brick5
Brick6: gluster6:/data/volume1/brick6
Options Reconfigured:
nfs.disable: on
transport.address-family: inet

Here are contents of currently unused/empty brick.

gluster1 # find /data/volume1/brick1
/data/volume1/brick1
/data/volume1/brick1/.glusterfs
/data/volume1/brick1/.glusterfs/indices
/data/volume1/brick1/.glusterfs/indices/xattrop
/data/volume1/brick1/.glusterfs/indices/entry-changes
/data/volume1/brick1/.glusterfs/quarantine
/data/volume1/brick1/.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008
/data/volume1/brick1/.glusterfs/changelogs
/data/volume1/brick1/.glusterfs/changelogs/htime
/data/volume1/brick1/.glusterfs/changelogs/csnap
/data/volume1/brick1/.glusterfs/brick1.db
/data/volume1/brick1/.glusterfs/brick1.db-wal
/data/volume1/brick1/.glusterfs/brick1.db-shm
/data/volume1/brick1/.glusterfs/00
/data/volume1/brick1/.glusterfs/00/00
/data/volume1/brick1/.glusterfs/00/00/00000000-0000-0000-0000-000000000001
/data/volume1/brick1/.glusterfs/landfill
/data/volume1/brick1/.glusterfs/unlink
/data/volume1/brick1/.glusterfs/health_check

The 6-node GlusterFS cluster is now complete and volume1 available to use.

Alternative

The GlusterFS’s documentation Quick Start Guide also suggests using Ansible to deploy and manage GlusterFS with gluster-ansible repository or gluster-ansible-cluster but they have below requirements.

  • Ansible version 2.5 or above.
  • GlusterFS version 3.2 or above.

As GlusterFS on FreeBSD is at 3.11.1 version I did not used them.

FreeBSD Client

We will now use another VirtualBox machine – also based on the same FreeBSD 12.0-RELEASE image – to create FreeBSD Client machine that will mount our volume1 volume.

We will need to install glusterfs package with pkg(8) command. Then we will use mount_glusterfs command to mount the volume. Keep in mind that in order to mount GlusterFS volume the FUSE (fuse.ko kernel module is needed.

client # pkg install glusterfs

client # kldload fuse

client # mount_glusterfs 10.0.10.11:volume1 /mnt

client # echo $?
0

client # mount
/dev/gpt/rootfs on / (ufs, local, soft-updates)
devfs on /dev (devfs, local, multilabel)
/dev/fuse on /mnt (fusefs, local, synchronous)

client # ls /mnt
ls: /mnt: Socket is not connected

It is mounted but does not work. The solution to this problem is to add appropriate /etc/hosts entries to the GlusterFS nodes.

client # cat /etc/hosts
::1                     localhost localhost.my.domain
127.0.0.1               localhost localhost.my.domain

10.0.10.11 gluster1
10.0.10.12 gluster2
10.0.10.13 gluster3
10.0.10.14 gluster4
10.0.10.15 gluster5
10.0.10.16 gluster6

Lets mount it again now with needed /etc/hosts entries.

client # umount /mnt

client # mount_glusterfs gluster1:volume1 /mnt

client # ls /mnt
client #

We now have our GlusterFS volume properly mounted and working on the FreeBSD Client machine.

Lets write some file there with dd(8) to see how it works.

client # dd  FILE bs=1m count=100 status=progress
  73400320 bytes (73 MB, 70 MiB) transferred 1.016s, 72 MB/s
100+0 records in
100+0 records out
104857600 bytes transferred in 1.565618 secs (66975227 bytes/sec)

Let’s see how it looks in the brick directory.

gluster1 # ls -lh /data/volume1/brick1
total 25640
drw-------  10 root  wheel   512B Jan  3 18:31 .glusterfs
-rw-r--r--   2 root  wheel    25M Jan  3 18:31 FILE

gluster1 # find /data
/data/
/data/volume1
/data/volume1/brick1
/data/volume1/brick1/.glusterfs
/data/volume1/brick1/.glusterfs/indices
/data/volume1/brick1/.glusterfs/indices/xattrop
/data/volume1/brick1/.glusterfs/indices/xattrop/xattrop-aed814f1-0eb0-46a1-b569-aeddf5048e06
/data/volume1/brick1/.glusterfs/indices/entry-changes
/data/volume1/brick1/.glusterfs/quarantine
/data/volume1/brick1/.glusterfs/quarantine/stub-00000000-0000-0000-0000-000000000008
/data/volume1/brick1/.glusterfs/changelogs
/data/volume1/brick1/.glusterfs/changelogs/htime
/data/volume1/brick1/.glusterfs/changelogs/csnap
/data/volume1/brick1/.glusterfs/brick1.db
/data/volume1/brick1/.glusterfs/brick1.db-wal
/data/volume1/brick1/.glusterfs/brick1.db-shm
/data/volume1/brick1/.glusterfs/00
/data/volume1/brick1/.glusterfs/00/00
/data/volume1/brick1/.glusterfs/00/00/00000000-0000-0000-0000-000000000001
/data/volume1/brick1/.glusterfs/landfill
/data/volume1/brick1/.glusterfs/unlink
/data/volume1/brick1/.glusterfs/health_check
/data/volume1/brick1/.glusterfs/ac
/data/volume1/brick1/.glusterfs/ac/b4
/data/volume1/brick1/.glusterfs/11
/data/volume1/brick1/.glusterfs/11/50
/data/volume1/brick1/.glusterfs/11/50/115043ca-420f-48b5-af05-c9552db2e585
/data/volume1/brick1/FILE

Linux Client

I will also show how to mount GlusterFS volume on the Red Hat clone CentOS in its latest 7.6 incarnation. It will require glusterfs-fuse package installation.

[root@localhost ~]# yum install glusterfs-fuse


[root@localhost ~]# rpm -q --filesbypkg glusterfs-fuse | grep /sbin/mount.glusterfs
glusterfs-fuse            /sbin/mount.glusterfs

[root@localhost ~]# mount.glusterfs 10.0.10.11:volume1 /mnt
Mount failed. Please check the log file for more details.

Similarly like with FreeBSD Client the /etc/hosts entries are needed.

[root@localhost ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.0.10.11 gluster1
10.0.10.12 gluster2
10.0.10.13 gluster3
10.0.10.14 gluster4
10.0.10.15 gluster5
10.0.10.16 gluster6

[root@localhost ~]# mount.glusterfs 10.0.10.11:volume1 /mnt

[root@localhost ~]# ls /mnt
FILE

[root@localhost ~]# mount
10.0.10.11:volume1 on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

With apropriate /etc/hosts entries it works as desired. We see the FILE file generated fron the FreeBSD Client machine.

GlusterFS Cluster Redundancy

After messing with the volume and creating and deleting various files I also tested its redundancy. In theory this RAID6 equivalent protection should protect us from the loss of two of six servers. After shutdown of two VirtualBox machines the volume is still available and ready to use.

Closing Thougts

Pity that FreeBSD does not provide more modern GlusterFS package as currently only 3.11.1 version is available.

EOF
Advertisements

5 thoughts on “GlusterFS Cluster on FreeBSD with Ansible and GNU Parallel

  1. oletange

    Looking at:

    ansible -i hosts –list-hosts gluster \
    | sed 1d \
    | while read I; do echo “ssh ${I} ‘sed -i \”\” s/quarterly/latest/g /etc/pkg/FreeBSD.conf'”; done

    it seems you will quickly get into quoting mess with that approach. If FreeBSD’s login shell supports functions then this might work:

    env_parallel –session
    my_replace() {
    echo joe
    sed -i “” s/quarterly/latest/g /etc/pkg/FreeBSD.conf
    }

    ansible -i hosts –list-hosts gluster |
    sed 1d |
    env_parallel -S – –nonall my_replace

    Like

    Reply
  2. Pingback: In Other BSDs for 2019/01/12 – DragonFly BSD Digest

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s