Sunday, November 7, 2021

Deploying PostgreSQL 14.0 for High Availability using Patroni, etcd, HAProxy and keepalived on CetntOS 8


Patroni is an automatic failover system for PostgreSQL. It provides automatic and manual failover and keeps all vital data in distributed configuration store (DCS). The database connections do not hapen directly to the database nodes but are routed via a connection proxy like HAProxy. Proxy determines the active/master node.

Using proxy for routing connections risk of having split brain scenario is very limited.

By using Patroni all the dynamic settings are stored into the DCS in order to have complete consistency on the participating nodes.

In this blog post I will focus on building Patroni cluster on top of CentOS 8 by using etcd in clustering and HAProxy for routing database connections to the primary server.


OS setup

Firewalld and selinux need to be adjusted before configuring Patroni cluster.


Firewalld

The ports required for operating patroni/etcd/haproxy/postgresql are the following:

5432 - PostgreSQL standard port, not used by PostgreSQL itself but by HAProxy
5000 - PostgreSQL listening port used by HAproxy to route the database connections to write node
5001 - PostgreSQL listening port used by HAproxy to route the database connections to read nodes
2380 - etcd peer urls port required by the etcd members communication
2379 - etcd client port required by any client including patroni to communicate with etcd
8008 - patroni rest api port required by HAProxy to check the nodes status
7000 - HAProxy port to expose the proxy’s statistics

selinux

Selinux by default prevents the new services to bind to all the ip addresses.

In order to allow HAProxy to bind the ports required for its functionality we need to run this command.

sudo setsebool -P haproxy_connect_any=1

Initial setup

$ cat /etc/hosts
192.168.56.51 psql13n51
192.168.56.52 psql13n52
192.168.56.53 psql13n53

ETCD

Etcd is a fault-tolerant, distributed key-value store used to store the state of the Postgres cluster. Via Patroni, all of the Postgres nodes make use of etcd to keep the Postgres cluster up and running.

In production, it may be best to use larger etcd cluster so that if one etcd node fails, it doesn't affect other Postgres servers.

Download and Install the etcd Binaries (All nodes)

Install etcd on all three nodes.

ETCD_VER=v3.5.1

# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

/tmp/etcd-download-test/etcd --version
/tmp/etcd-download-test/etcdctl version
/tmp/etcd-download-test/etcdutl version
Move binaries to /usr/local/bin directory.

mv /tmp/etcd-download-test/etcd* /usr/local/bin/
Check etcd and etcdctl version.

$ etcd --version
etcd Version: 3.5.1
Git SHA: e8732fb5f
Go Version: go1.16.3
Go OS/Arch: linux/amd64

$ etcdctl version
etcdctl version: 3.5.1
API version: 3.5
Configure Etcd Systemd service:

Create etcd directories and user (All nodes)

Create etcd system user:

sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
Set /var/lib/etcd/ directory ownership to etcd user:

sudo mkdir -p /var/lib/etcd/
sudo mkdir /etc/etcd
sudo chown -R etcd:etcd /var/lib/etcd/
sudo chmod -R 700 /var/lib/etcd/
Configure the etcd on all nodes.

On each server, save these variables by running the commands below.

INT_NAME="eth1"
#INT_NAME="ens3"
ETCD_HOST_IP=$(ip addr show $INT_NAME | grep "inet\b" | awk '{print $2}' | cut -d/ -f1)
ETCD_NAME=$(hostname -s)
Where:
INT_NAME - The name of your network interface to be used for cluster traffic. Change it to match your server configuration.
ETCD_HOST_IP - The internal IP address of the specified network interface. This is used to serve client requests and communicate with etcd cluster peers.
ETCD_NAME – Each etcd member must have a unique name within an etcd cluster. Command used will set the etcd name to match the hostname of the current compute instance.

Check variables to confirm they have correct values:

echo $INT_NAME
echo $ETCD_HOST_IP
echo $ETCD_NAME
Once all variables are set, create the etcd.service systemd unit file:

Create a systemd service file for etcd. Replace --listen-client-urls with your server IPs.
For ETCD 3.5 default is api v3 but Patroni doesn't currently support v3 API so it is important to set parameter enable-v2=true.

cat << EOF > /lib/systemd/system/etcd.service
[Unit]
Description=etcd service
Documentation=https://github.com/coreos/etcd
 
[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd \\
 --name ${ETCD_NAME} \\
 --enable-v2=true \\
 --data-dir /var/lib/etcd \\
 --initial-advertise-peer-urls http://${ETCD_HOST_IP}:2380 \\
 --listen-peer-urls http://${ETCD_HOST_IP}:2380 \\
 --listen-client-urls http://${ETCD_HOST_IP}:2379,http://127.0.0.1:2379 \\
 --advertise-client-urls http://${ETCD_HOST_IP}:2379 \\
 --initial-cluster-token etcd-cluster-1 \\
 --initial-cluster psql13n51=http://192.168.56.51:2380,psql13n52=http://192.168.56.52:2380,psql13n53=http://192.168.56.53:2380 \\
 --initial-cluster-state new \\
 --heartbeat-interval 1000 \\
 --election-timeout 5000
Restart=on-failure
RestartSec=5
 
[Install]
WantedBy=multi-user.target
EOF
For CentOS / RHEL Linux distributions, set SELinux mode to permissive.

sudo setenforce 0
sudo sed -i 's/^SELINUX=.*/SELINUX=permissive/g' /etc/selinux/config
If you have active firewall service, allow ports 2379 and 2380.

# RHEL / CentOS / Fedora firewalld
sudo firewall-cmd --add-port={2379,2380}/tcp --permanent
sudo firewall-cmd --reload

# Ubuntu/Debian
sudo ufw allow proto tcp from any to any port 2379,2380
Bootstrap The etcd Cluster

Once all the configurations are applied on the three servers, start and enable the newly created etcd service on all the nodes. The first server will act as a bootstrap node. One node will be automatically elected as a leader once the service is started in all the three nodes.

# systemctl daemon-reload
# systemctl enable etcd
# systemctl start etcd.service
# systemctl status -l etcd.service
Test Etcd Cluster installation

Test your setup by listing the etcd cluster members:

# etcdctl member list
Check leader on host:

# etcdctl endpoint status --write-out=table
Also check cluster health by running the command:

# etcdctl endpoint health

127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.383594ms
Let’s also try writing to etcd.

# etcdctl put /message "Hello World"
Read the value of message back – It should work on all nodes.

# etcdctl get /message
Hello World
Watchdog

Watchdog devices will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail. It is recommended to deploy watchdog mechanism in PostgreSQL HA when running configuration in the production.

Install on all nodes.

yum -y install watchdog

/sbin/modprobe softdog
Patroni will be the component interacting with the watchdog device. Since Patroni is run by the postgres user, we need to either set the permissions of the watchdog device open enough so the postgres user can write to it or make the device owned by postgres itself, which we consider a safer approach (as it is more restrictive):

Include the softdog kernel module to load on CentOS boot up.

It’s better that the softdog module is not loaded via /etc/rc.local but the default CentOS methodology to load module from /etc/rc.module is used:

echo modprobe softdog >> /etc/rc.modules
chmod +x /etc/rc.modules
sudo sh -c 'echo "KERNEL==\"watchdog\", OWNER=\"postgres\", GROUP=\"postgres\"" >> /etc/udev/rules.d/61-watchdog.rules'
Check if module is blacklisted by default and there was a strain file with such a directive still lingering around.

$ grep blacklist /lib/modprobe.d/* /etc/modprobe.d/* |grep softdog
Editing that file in each of the nodes to remove the line above and restarting the servers.

$ lsmod | grep softdog
softdog                16384  0

[root@localhost ~]# ls -l /dev/watchdog*
crw-------. 1 root root  10, 130 Nov  5 11:13 /dev/watchdog
crw-------. 1 root root 248,   0 Nov  5 11:13 /dev/watchdog0
PostgreSQL

Install PostgreSQL on all nodes.

By default the postgres module will have an older version of postgres enabled. But the current module does not include postgresql 14. Confirm with this command:

sudo dnf module list postgresql
Let us Install the repository RPM using this command:

sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
Then to avoid conflicts, let us disable the built-in PostgreSQL module:

sudo dnf -qy module disable postgresql
Finally install PostgreSQL 14 server:

sudo dnf install -y postgresql14-server
Let’s also install the Contrib package which provides several additional features for the PostgreSQL database system:

sudo dnf install -y postgresql14-contrib
An important concept to understand in a PostgreSQL HA environment like this one is that PostgreSQL should not be started automatically by systemd during the server initialization: we should leave it to Patroni to fully manage it, including the process of starting and stopping the server. Thus, we should disable the service:

sudo systemctl disable postgresql-14
Start with a fresh new PostgreSQL setup and let Patroni bootstrap the cluster. Remove the data directory that has been created as part of the PostgreSQL installation:

sudo systemctl stop postgresql-14
sudo rm -fr /var/lib/pgsql/14/data
Patroni

Patroni is a cluster manager used to customize and automate deployment and maintenance of PostgreSQL HA (High Availability) clusters. You should check the latest available release from Github page.


Install Patroni and python client for ETCD on all 3 nodes:

# yum install patroni-etcd
# yum install pyhton3-etcd
If you have active firewall service, allow 5432 port on all nodes. ``` # RHEL / CentOS / Fedora firewalld sudo firewall-cmd --add-port=5432/tcp --permanent sudo firewall-cmd --reload # Ubuntu/Debian sudo ufw allow proto tcp from any to any port 5432 ```
pip install python-etcd
Here’s the configuration file we have used for psql13n51:

cat /opt/app/patroni/etc/postgresql.yml

scope: postgres
name: psql13n51

restapi:
    listen: 0.0.0.0:8008
    connect_address: 192.168.56.51:8008

etcd:
    host: psql13n51:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        logging_collector: "on"
        max_wal_senders: 5
        max_replication_slots: 5

  initdb:
  - encoding: UTF8
  - data-checksums

  pg_hba:
  - host replication replicator 127.0.0.1/32 trust
  - host replication replicator 192.168.56.1/24 md5
  - host all all 192.168.56.1/24 md5
  - host all all 0.0.0.0/0 md5

  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.56.51:5432
  data_dir: "/var/lib/pgsql/14/data"
  bin_dir: "/usr/pgsql-14/bin"
  pgpass: /tmp/pgpass
  authentication:
    replication:
      username: replicator
      password: vagrant
    superuser:
      username: postgres
      password: vagrant
  parameters:
    unix_socket_directories: '/var/run/postgresql'

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false
Validate configuration:

patroni --validate-config /opt/app/patroni/etc/postgresql.yml
Lets bootstrap the cluster using parameter from yml file as postgres user:

# sudo su - postgres
patroni /opt/app/patroni/etc/postgresql.yml
2021-11-06 07:20:41,692 INFO: postmaster pid=1863
2021-11-06 07:20:41.704 UTC [1863] LOG:  redirecting log output to logging collector process
2021-11-06 07:20:41.704 UTC [1863] HINT:  Future log output will appear in directory "log".
localhost:5432 - rejecting connections
localhost:5432 - accepting connections
2021-11-06 07:20:41,807 INFO: establishing a new patroni connection to the postgres cluster
2021-11-06 07:20:41,820 INFO: running post_bootstrap
2021-11-06 07:20:41,844 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds
2021-11-06 07:20:41,864 INFO: initialized a new cluster
2021-11-06 07:20:51,859 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:20:51,880 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:01,883 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:11,877 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:21,878 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:31,877 INFO: no action. I am (psql13n51) the leader with the lock
2021-11-06 07:21:41,880 INFO: no action. I am (psql13n51) the leader with the lock
Next, edit postgresql.yml file on psql13n52 node, and add the following configuration parameters.
Make sure, you change namespace, etcd host name, listen and connect_address:

cat /opt/app/patroni/etc/postgresql.yml

scope: postgres
name: psql13n52

restapi:
    listen: 0.0.0.0:8008
    connect_address: 192.168.56.52:8008

etcd:
    host: psql13n52:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        logging_collector: "on"
        max_wal_senders: 5
        max_replication_slots: 5

  initdb:
  - encoding: UTF8
  - data-checksums

  pg_hba:
  - host replication replicator 127.0.0.1/32 trust
  - host replication replicator 192.168.56.1/24 md5
  - host all all 192.168.56.1/24 md5
  - host all all 0.0.0.0/0 md5

  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.56.52:5432
  data_dir: "/var/lib/pgsql/14/data"
  bin_dir: "/usr/pgsql-14/bin"
  pgpass: /tmp/pgpass
  authentication:
    replication:
      username: replicator
      password: vagrant
    superuser:
      username: postgres
      password: vagrant
  parameters:
    unix_socket_directories: '/var/run/postgresql'

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false
Validate configuration:

patroni --validate-config /opt/app/patroni/etc/postgresql.yml
Run as postgres user on psql13n52 node:

# sudo su - postgres
patroni /opt/app/patroni/etc/postgresql.yml
2021-11-06 07:23:25,827 INFO: Selected new etcd server http://192.168.56.53:2379
2021-11-06 07:23:25,831 INFO: No PostgreSQL configuration items changed, nothing to reload.
2021-11-06 07:23:25,839 INFO: Lock owner: psql13n51; I am psql13n52
2021-11-06 07:23:25,843 INFO: trying to bootstrap from leader 'psql13n51'
2021-11-06 07:23:26,237 INFO: replica has been created using basebackup
2021-11-06 07:23:26,238 INFO: bootstrapped from leader 'psql13n51'
2021-11-06 07:23:26,398 INFO: postmaster pid=1506
localhost:5432 - no response
2021-11-06 07:23:26.435 UTC [1506] LOG:  redirecting log output to logging collector process
2021-11-06 07:23:26.435 UTC [1506] HINT:  Future log output will appear in directory "log".
localhost:5432 - accepting connections
localhost:5432 - accepting connections
2021-11-06 07:23:27,449 INFO: Lock owner: psql13n51; I am psql13n52
2021-11-06 07:23:27,449 INFO: establishing a new patroni connection to the postgres cluster
2021-11-06 07:23:27,478 INFO: no action. I am a secondary (psql13n52) and following a leader (psql13n51)
2021-11-06 07:23:31,874 INFO: no action. I am a secondary (psql13n52) and following a leader (psql13n51)
2021-11-06 07:23:41,879 INFO: no action. I am a secondary (psql13n52) and following a leader (psql13n51)
Next, edit postgresql.yml file on psql13n53:

cat /opt/app/patroni/etc/postgresql.yml

scope: postgres
name: psql13n53

restapi:
    listen: 0.0.0.0:8008
    connect_address: 192.168.56.53:8008

etcd:
    host: psql13n53:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        logging_collector: "on"
        max_wal_senders: 5
        max_replication_slots: 5

  initdb:
  - encoding: UTF8
  - data-checksums

  pg_hba:
  - host replication replicator 127.0.0.1/32 trust
  - host replication replicator 192.168.56.1/24 md5
  - host all all 192.168.56.1/24 md5
  - host all all 0.0.0.0/0 md5

  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.56.53:5432
  data_dir: "/var/lib/pgsql/14/data"
  bin_dir: "/usr/pgsql-14/bin"
  pgpass: /tmp/pgpass
  authentication:
    replication:
      username: replicator
      password: vagrant
    superuser:
      username: postgres
      password: vagrant
  parameters:
    unix_socket_directories: '/var/run/postgresql'

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false
Validate configuration:

patroni --validate-config /opt/app/patroni/etc/postgresql.yml
Run as postgres user:

# sudo su - postgres
patroni /opt/app/patroni/etc/postgresql.yml
2021-11-06 07:25:26,664 INFO: Selected new etcd server http://192.168.56.53:2379
2021-11-06 07:25:26,667 INFO: No PostgreSQL configuration items changed, nothing to reload.
2021-11-06 07:25:26,673 INFO: Lock owner: psql13n51; I am psql13n53
2021-11-06 07:25:26,676 INFO: trying to bootstrap from leader 'psql13n51'
2021-11-06 07:25:27,102 INFO: replica has been created using basebackup
2021-11-06 07:25:27,102 INFO: bootstrapped from leader 'psql13n51'
2021-11-06 07:25:27,262 INFO: postmaster pid=1597
localhost:5432 - no response
2021-11-06 07:25:27.299 UTC [1597] LOG:  redirecting log output to logging collector process
2021-11-06 07:25:27.299 UTC [1597] HINT:  Future log output will appear in directory "log".
localhost:5432 - accepting connections
localhost:5432 - accepting connections
2021-11-06 07:25:28,312 INFO: Lock owner: psql13n51; I am psql13n53
2021-11-06 07:25:28,313 INFO: establishing a new patroni connection to the postgres cluster
2021-11-06 07:25:28,340 INFO: no action. I am a secondary (psql13n53) and following a leader (psql13n51)
2021-11-06 07:25:31,877 INFO: no action. I am a secondary (psql13n53) and following a leader (psql13n51)
Check the state of the Patroni cluster:

# patronictl -c /opt/app/patroni/etc/postgresql.yml list

+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member    | Host          | Role    | State   | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader  | running |  1 |           |
| psql13n52 | 192.168.56.52 | Replica | running |  1 |         0 |
| psql13n53 | 192.168.56.53 | Replica | running |  1 |         0 |
+-----------+---------------+---------+---------+----+-----------+
Psql13n51 started the Patroni cluster so it was automatically made the leader – and thus the primary/master PostgreSQL server. Nodes psql13n52 and psql13n53 are configured as read replicas (as the hot_standby option was enabled in Patroni’s configuration file).

Check PostgreSQL configuration parameters:

patronictl -c /opt/app/patroni/etc/postgresql.yml show-config postgres

loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    hot_standby: 'on'
    logging_collector: 'on'
    max_replication_slots: 5
    max_wal_senders: 5
    wal_level: replica
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
ttl: 30
With the configuration file in place, and now that we already have the etcd cluster up, all that is required is to restart the Patroni service:

Configure Patroni service on every node:

# vi /etc/systemd/system/patroni.service
 
[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target etcd.target
 
[Service]
Type=simple
 
User=postgres
Group=postgres
 
ExecStart=/usr/bin/patroni /opt/app/patroni/etc/postgresql.yml
 
KillMode=process
 
TimeoutSec=30
 
Restart=no
 
[Install]
WantedBy=multi-user.target
# systemctl status patroni
# systemctl start patroni
# systemctl enable patroni
 
# systemctl status etcd
# systemctl enable etcd
Reboot all 3 nodes:

# reboot
Check status of the Patroni service after reboot. Service should be up and running.

# systemctl status patroni
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/etc/systemd/system/patroni.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2021-11-05 23:33:22 UTC; 8h ago
 Main PID: 705 (patroni)
    Tasks: 14 (limit: 11401)
   Memory: 146.1M
   CGroup: /system.slice/patroni.service
           ├─ 705 /usr/bin/python3 /usr/bin/patroni /opt/app/patroni/etc/postgresql.yml
           ├─1278 /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data --config-file=/var/lib/pgsql/14/data/postgresql.conf --listen_addresses=0.0.0.0 --port=5432 --cluster_name=postgres --wal_level=replica --hot_standby=on --max_conn>
           ├─1280 postgres: postgres: logger
           ├─1282 postgres: postgres: checkpointer
           ├─1283 postgres: postgres: background writer
           ├─1284 postgres: postgres: stats collector
           ├─1359 postgres: postgres: postgres postgres 127.0.0.1(37558) idle
           ├─1371 postgres: postgres: walwriter
           ├─1372 postgres: postgres: autovacuum launcher
           └─1373 postgres: postgres: logical replication launcher

Nov 06 07:43:00 psql13n51 patroni[705]: 2021-11-06 07:43:00,048 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds
Nov 06 07:43:00 psql13n51 patroni[705]: 2021-11-06 07:43:00,058 INFO: promoted self to leader by acquiring session lock
Nov 06 07:43:00 psql13n51 patroni[705]: server promoting
Nov 06 07:43:00 psql13n51 patroni[705]: 2021-11-06 07:43:00,064 INFO: cleared rewind state after becoming the leade
Check again state of the Petroni cluster:

# patronictl -c /opt/app/patroni/etc/postgresql.yml list
+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member    | Host          | Role    | State   | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Leader  | running |  5 |           |
| psql13n52 | 192.168.56.52 | Replica | running |  5 |         0 |
| psql13n53 | 192.168.56.53 | Replica | running |  5 |         0 |
+-----------+---------------+---------+---------+----+-----------+
$ sudo systemctl enable patroni
$ sudo systemctl start patroni
$ sudo systemctl status patroni
Postgresql is prohibited from the auto start, because it is managed by postgresql patroni.

Keepalived

Keepalived is used for IP failover between more servers.
Download latest Keepalived installation https://www.keepalived.org/download.html.

Run this on all 3 nodes:

wget https://www.keepalived.org/software/keepalived-2.2.4.tar.gz
Unpack archive and configure:

# tar xvfz keepalived-2.2.4.tar.gz
# cd keepalived-2.2.4
# ./configure
Fix errors befure running make command.

In my case I need to install openssl and libnl3 packages.

configure: error:
  !!! OpenSSL is not properly installed on your system. !!!
  !!! Can not include OpenSSL headers files.            !!!

yum -y install openssl openssl-devel

# ./configure
*** WARNING - this build will not support IPVS with IPv6. Please install libnl/libnl-3 dev libraries to support IPv6 with IPVS.

yum -y install libnl3 libnl3-devel

# ./configure
When there is no error then run make and make install:

# make && make install
For Keepalived startup the same service is created on all servers.

# cat /etc/systemd/system/keepalived.service
[Unit]
Description=LVS and VRRP High Availability Monitor
After=network-online.target syslog.target
Wants=network-online.target
[Service]
Type=forking
PIDFile=/run/keepalived.pid
KillMode=process
EnvironmentFile=-/usr/local/etc/sysconfig/keepalived
ExecStart=/usr/local/sbin/keepalived $KEEPALIVED_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
Before starting keepalived, we have to configure a configuration file on all servers.

psql13n51:

# cat /etc/keepalived/keepalived.conf
global_defs {
}
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
    script "killall -0 haproxy" # widely used idiom
    interval 2 # check every 2 seconds
    weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
    interface eth0
    state MASTER # or "BACKUP" on backup
    priority 101 # 101 on master, 100 on backup
    virtual_router_id 51
    authentication {
        auth_type PASS
        auth_pass 1234
    }
    virtual_ipaddress {
        192.168.56.100
    }
    track_script {
        chk_haproxy
    }
}
psql13n52 & psql13n53:

# cat /etc/keepalived/keepalived.conf
global_defs {
}
vrrp_script chk_haproxy { # Requires keepalived-1.1.13
    script "killall -0 haproxy" # widely used idiom
    interval 2 # check every 2 seconds
    weight 2 # add 2 points of prio if OK
}
vrrp_instance VI_1 {
    interface eth0
    state BACKUP # or "BACKUP" on backup
    priority 100 # 101 on master, 100 on backup
    virtual_router_id 51
    authentication {
        auth_type PASS
        auth_pass 1234
    }
    virtual_ipaddress {
        192.168.56.100
    }
    track_script {
        chk_haproxy
    }
}
Now start keepalived service on all servers

# systemctl start keepalived
# systemctl status keepalived
# systemctl enable keepalived
The VIP 192.168.56.100 should be run on one server and will automatically failover to the the second server if there is any issue with the server.

HAProxy

Instead of connecting directly to the database server, the application will be connecting to the proxy instead, which will forward the request to PostgreSQL. When HAproxy is used for this, it is also possible to route read requests to one or more replicas, for load balancing. With HAproxy, this is done by providing two different ports for the application to connect. We opted for the following setup:

Writes → 5000
Reads → 5001

HAproxy light-weight service and it can be installed as an independent server or, as in our case, on the database server.

sudo yum -y install haproxy
Set configuration on all nodes:

$ cat /etc/haproxy/haproxy.cfg
global
    maxconn 100
 
defaults
    log    global
    mode    tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s
 
listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /
 
listen primary
    bind *:5000
    option httpchk OPTIONS /master
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server psql13n51 psql13n51:5432 maxconn 100 check port 8008
    server psql13n52 psql13n52:5432 maxconn 100 check port 8008
    server psql13n53 psql13n53:5432 maxconn 100 check port 8008
 
listen standbys
    balance roundrobin
    bind *:5001
    option httpchk OPTIONS /replica
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server psql13n51 psql13n51:5432 maxconn 100 check port 8008
    server psql13n52 psql13n52:5432 maxconn 100 check port 8008
    server psql13n53 psql13n53:5432 maxconn 100 check port 8008
Note there are two sections: primary, using port 5000, and standbys, using port 5001. All three nodes are included in both sections: that’s because they are all potential candidates to be either primary or secondary.

For HAproxy to know which role each node currently has, it will send an HTTP request to port 8008 of the node: Patroni will answer. Patroni provides a built-in REST API support for health check monitoring that integrates perfectly with HAproxy for this:

$ curl -s http://psql13n51:8008
{"state": "running", "postmaster_start_time": "2021-11-06 15:01:56.197081+00:00", "role": "replica", "server_version": 140000, "cluster_unlocked": false, "xlog": {"received_location": 83888920, "replayed_location": 83888920, "replayed_timestamp": null, "paused": false}, "timeline": 6, "database_system_identifier": "7027353509639501631", "patroni": {"version": "2.1.1", "scope": "postgres"}}
Notice how we received "role": "replica" from Patroni when we sent HTTP request to standby server. HAProxy uses this information for query routing.

We configured the standbys group to balance read-requests in a round-robin fashion, so each connection request (or reconnection) will alternate between the available replicas.

Let’s start HAProxy on all three nodes:

# systemctl enable haproxy.service
# systemctl start haproxy.service
# systemctl status haproxy.service
Test connections for master or standby connections:

sudo su - postgres
echo "localhost:5000:postgres:postgres:vagrant" > ~/.pgpass
echo "localhost:5001:postgres:postgres:vagrant" >> ~/.pgpass
chmod 0600 ~/.pgpass
We can then execute two read-requests to verify the round-robin mechanism is working as intended:

$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()"
192.168.56.51

$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()"
192.168.56.52
Master (read/write) connection:

$ psql -Upostgres -hlocalhost -p5000 -t -c "select inet_server_addr()"
192.168.56.53
You can also check the state of HAproxy by visiting http://192.168.56.51:7000/ on your browser.

# patronictl -c /opt/app/patroni/etc/postgresql.yml list

+ Cluster: postgres (7027353509639501631) ------+----+-----------+
| Member    | Host          | Role    | State   | TL | Lag in MB |
+-----------+---------------+---------+---------+----+-----------+
| psql13n51 | 192.168.56.51 | Replica | running |  7 |         0 |
| psql13n52 | 192.168.56.52 | Replica | running |  7 |         0 |
| psql13n53 | 192.168.56.53 | Leader  | running |  7 |           |
+-----------+---------------+---------+---------+----+-----------+

We have set up three node Patroni cluster without no single point of failure (SPOF). In next part we will test disaster and failover scenarios on this configuration with active read/write workload.


Reference:
https://github.com/zalando/patroni
https://patroni.readthedocs.io/en/latest/
https://www.percona.com/blog/2021/06/11/postgresql-ha-with-patroni-your-turn-to-test-failure-scenarios/
https://blog.dbi-services.com/postgresql-high-availabilty-patroni-ectd-haproxy-keepalived/
https://digitalis.io/blog/technology/part1-postgresql-ha-patroni-etcd-haproxy/


8 comments:

  1. Nice one...glad that you moved to open source solutions as Oracle seems to be dead end story...step that I have to do ASAP!

    ReplyDelete
  2. Thanks! I am not moved completely to open source solutions but just extending knowledge a little. It is exciting to learn more about what is happening in the open source database world :-)

    ReplyDelete
  3. Thank you very much for this precise and clear article!
    I am thinking about a different readonly access. The primary should also be able to answer readonly access and standbys which lag far behind the primary should not be asked by haproxy. Any idea?

    ReplyDelete
    Replies
    1. Hello!
      In current configuration Primary can accept read only and read write requests. It is not clear to me why Primary should accept requests on port dedicated for read only requests.
      If you need replica without lag you could use synchronous replication or you can configure maximum lag threshold on the Patroni side.
      I am not sure that you can add complexity on the HAProxy side for additional verifications when routing.
      Best regards,
      Marko

      Delete
  4. great article, thank you! are there some gotchas setting this up on debian/ubuntu?

    ReplyDelete
    Replies
    1. Hello!
      Thank you for the comment.
      I haven't tested installation/configuration for debian/ubuntu.

      Best regards,
      Marko

      Delete