Patroni is an automatic failover system for PostgreSQL. It provides automatic and manual failover and keeps all vital data in distributed configuration store (DCS). The database connections do not hapen directly to the database nodes but are routed via a connection proxy like HAProxy. Proxy determines the active/master node.
Using proxy for routing connections risk of having split brain scenario is very limited.
By using Patroni all the dynamic settings are stored into the DCS in order to have complete consistency on the participating nodes.
In this blog post I will focus on building Patroni cluster on top of CentOS 8 by using etcd in clustering and HAProxy for routing database connections to the primary server.
OS setup
Firewalld and selinux need to be adjusted before configuring Patroni cluster.
Firewalld
The ports required for operating patroni/etcd/haproxy/postgresql are the following:
5432 - PostgreSQL standard port, not used by PostgreSQL itself but by HAProxy
5000 - PostgreSQL listening port used by HAproxy to route the database connections to write node
5001 - PostgreSQL listening port used by HAproxy to route the database connections to read nodes
2380 - etcd peer urls port required by the etcd members communication
2379 - etcd client port required by any client including patroni to communicate with etcd
8008 - patroni rest api port required by HAProxy to check the nodes status
7000 - HAProxy port to expose the proxy’s statistics
selinux
Selinux by default prevents the new services to bind to all the ip addresses.
In order to allow HAProxy to bind the ports required for its functionality we need to run this command.
sudo setsebool -P haproxy_connect_any=1
Initial setup
$ cat /etc/hosts 192.168.56.51 psql13n51 192.168.56.52 psql13n52 192.168.56.53 psql13n53
ETCD
Etcd is a fault-tolerant, distributed key-value store used to store the state of the Postgres cluster. Via Patroni, all of the Postgres nodes make use of etcd to keep the Postgres cluster up and running.
In production, it may be best to use larger etcd cluster so that if one etcd node fails, it doesn't affect other Postgres servers.
Download and Install the etcd Binaries (All nodes)
Install etcd on all three nodes.
ETCD_VER=v3.5.1 # choose either URL GOOGLE_URL=https://storage.googleapis.com/etcd GITHUB_URL=https://github.com/etcd-io/etcd/releases/download DOWNLOAD_URL=${GOOGLE_URL} rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1 rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz /tmp/etcd-download-test/etcd --version /tmp/etcd-download-test/etcdctl version /tmp/etcd-download-test/etcdutl versionMove binaries to /usr/local/bin directory.
mv /tmp/etcd-download-test/etcd* /usr/local/bin/Check etcd and etcdctl version.
$ etcd --version etcd Version: 3.5.1 Git SHA: e8732fb5f Go Version: go1.16.3 Go OS/Arch: linux/amd64 $ etcdctl version etcdctl version: 3.5.1 API version: 3.5Configure Etcd Systemd service:
Create etcd directories and user (All nodes)
Create etcd system user:
sudo groupadd --system etcd sudo useradd -s /sbin/nologin --system -g etcd etcdSet /var/lib/etcd/ directory ownership to etcd user:
sudo mkdir -p /var/lib/etcd/ sudo mkdir /etc/etcd sudo chown -R etcd:etcd /var/lib/etcd/ sudo chmod -R 700 /var/lib/etcd/Configure the etcd on all nodes.
On each server, save these variables by running the commands below.
INT_NAME="eth1" #INT_NAME="ens3" ETCD_HOST_IP=$(ip addr show $INT_NAME | grep "inet\b" | awk '{print $2}' | cut -d/ -f1) ETCD_NAME=$(hostname -s)Where:
INT_NAME - The name of your network interface to be used for cluster traffic. Change it to match your server configuration.
ETCD_HOST_IP - The internal IP address of the specified network interface. This is used to serve client requests and communicate with etcd cluster peers.
ETCD_NAME – Each etcd member must have a unique name within an etcd cluster. Command used will set the etcd name to match the hostname of the current compute instance.
Check variables to confirm they have correct values:
echo $INT_NAME echo $ETCD_HOST_IP echo $ETCD_NAMEOnce all variables are set, create the etcd.service systemd unit file:
Create a systemd service file for etcd. Replace --listen-client-urls with your server IPs.
For ETCD 3.5 default is api v3 but Patroni doesn't currently support v3 API so it is important to set parameter enable-v2=true.
cat << EOF > /lib/systemd/system/etcd.service [Unit] Description=etcd service Documentation=https://github.com/coreos/etcd [Service] User=etcd Type=notify ExecStart=/usr/local/bin/etcd \\ --name ${ETCD_NAME} \\ --enable-v2=true \\ --data-dir /var/lib/etcd \\ --initial-advertise-peer-urls http://${ETCD_HOST_IP}:2380 \\ --listen-peer-urls http://${ETCD_HOST_IP}:2380 \\ --listen-client-urls http://${ETCD_HOST_IP}:2379,http://127.0.0.1:2379 \\ --advertise-client-urls http://${ETCD_HOST_IP}:2379 \\ --initial-cluster-token etcd-cluster-1 \\ --initial-cluster psql13n51=http://192.168.56.51:2380,psql13n52=http://192.168.56.52:2380,psql13n53=http://192.168.56.53:2380 \\ --initial-cluster-state new \\ --heartbeat-interval 1000 \\ --election-timeout 5000 Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target EOFFor CentOS / RHEL Linux distributions, set SELinux mode to permissive.
sudo setenforce 0 sudo sed -i 's/^SELINUX=.*/SELINUX=permissive/g' /etc/selinux/configIf you have active firewall service, allow ports 2379 and 2380.
# RHEL / CentOS / Fedora firewalld sudo firewall-cmd --add-port={2379,2380}/tcp --permanent sudo firewall-cmd --reload # Ubuntu/Debian sudo ufw allow proto tcp from any to any port 2379,2380Bootstrap The etcd Cluster
Once all the configurations are applied on the three servers, start and enable the newly created etcd service on all the nodes. The first server will act as a bootstrap node. One node will be automatically elected as a leader once the service is started in all the three nodes.
# systemctl daemon-reload # systemctl enable etcd # systemctl start etcd.service # systemctl status -l etcd.serviceTest Etcd Cluster installation
Test your setup by listing the etcd cluster members:
# etcdctl member listCheck leader on host:
# etcdctl endpoint status --write-out=tableAlso check cluster health by running the command:
# etcdctl endpoint health 127.0.0.1:2379 is healthy: successfully committed proposal: took = 4.383594msLet’s also try writing to etcd.
# etcdctl put /message "Hello World"Read the value of message back – It should work on all nodes.
# etcdctl get /message Hello WorldWatchdog
Watchdog devices will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail. It is recommended to deploy watchdog mechanism in PostgreSQL HA when running configuration in the production.
Install on all nodes.
yum -y install watchdog
/sbin/modprobe softdogPatroni will be the component interacting with the watchdog device. Since Patroni is run by the postgres user, we need to either set the permissions of the watchdog device open enough so the postgres user can write to it or make the device owned by postgres itself, which we consider a safer approach (as it is more restrictive):
Include the softdog kernel module to load on CentOS boot up.
It’s better that the softdog module is not loaded via /etc/rc.local but the default CentOS methodology to load module from /etc/rc.module is used:
echo modprobe softdog >> /etc/rc.modules chmod +x /etc/rc.modules
sudo sh -c 'echo "KERNEL==\"watchdog\", OWNER=\"postgres\", GROUP=\"postgres\"" >> /etc/udev/rules.d/61-watchdog.rules'Check if module is blacklisted by default and there was a strain file with such a directive still lingering around.
$ grep blacklist /lib/modprobe.d/* /etc/modprobe.d/* |grep softdogEditing that file in each of the nodes to remove the line above and restarting the servers.
$ lsmod | grep softdog softdog 16384 0
[root@localhost ~]# ls -l /dev/watchdog* crw-------. 1 root root 10, 130 Nov 5 11:13 /dev/watchdog crw-------. 1 root root 248, 0 Nov 5 11:13 /dev/watchdog0PostgreSQL
Install PostgreSQL on all nodes.
By default the postgres module will have an older version of postgres enabled. But the current module does not include postgresql 14. Confirm with this command:
sudo dnf module list postgresqlLet us Install the repository RPM using this command:
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpmThen to avoid conflicts, let us disable the built-in PostgreSQL module:
sudo dnf -qy module disable postgresqlFinally install PostgreSQL 14 server:
sudo dnf install -y postgresql14-serverLet’s also install the Contrib package which provides several additional features for the PostgreSQL database system:
sudo dnf install -y postgresql14-contribAn important concept to understand in a PostgreSQL HA environment like this one is that PostgreSQL should not be started automatically by systemd during the server initialization: we should leave it to Patroni to fully manage it, including the process of starting and stopping the server. Thus, we should disable the service:
sudo systemctl disable postgresql-14Start with a fresh new PostgreSQL setup and let Patroni bootstrap the cluster. Remove the data directory that has been created as part of the PostgreSQL installation:
sudo systemctl stop postgresql-14 sudo rm -fr /var/lib/pgsql/14/dataPatroni
Patroni is a cluster manager used to customize and automate deployment and maintenance of PostgreSQL HA (High Availability) clusters. You should check the latest available release from Github page.
Install Patroni and python client for ETCD on all 3 nodes:
# yum install patroni-etcd # yum install pyhton3-etcdIf you have active firewall service, allow 5432 port on all nodes. ``` # RHEL / CentOS / Fedora firewalld sudo firewall-cmd --add-port=5432/tcp --permanent sudo firewall-cmd --reload # Ubuntu/Debian sudo ufw allow proto tcp from any to any port 5432 ```
pip install python-etcdHere’s the configuration file we have used for psql13n51:
cat /opt/app/patroni/etc/postgresql.yml scope: postgres name: psql13n51 restapi: listen: 0.0.0.0:8008 connect_address: 192.168.56.51:8008 etcd: host: psql13n51:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true use_slots: true parameters: wal_level: replica hot_standby: "on" logging_collector: "on" max_wal_senders: 5 max_replication_slots: 5 initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 trust - host replication replicator 192.168.56.1/24 md5 - host all all 192.168.56.1/24 md5 - host all all 0.0.0.0/0 md5 users: admin: password: admin options: - createrole - createdb postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.56.51:5432 data_dir: "/var/lib/pgsql/14/data" bin_dir: "/usr/pgsql-14/bin" pgpass: /tmp/pgpass authentication: replication: username: replicator password: vagrant superuser: username: postgres password: vagrant parameters: unix_socket_directories: '/var/run/postgresql' watchdog: mode: required device: /dev/watchdog safety_margin: 5 tags: nofailover: false noloadbalance: false clonefrom: false nosync: falseValidate configuration:
patroni --validate-config /opt/app/patroni/etc/postgresql.ymlLets bootstrap the cluster using parameter from yml file as postgres user:
# sudo su - postgres patroni /opt/app/patroni/etc/postgresql.yml
2021-11-06 07:20:41,692 INFO: postmaster pid=1863 2021-11-06 07:20:41.704 UTC [1863] LOG: redirecting log output to logging collector process 2021-11-06 07:20:41.704 UTC [1863] HINT: Future log output will appear in directory "log". localhost:5432 - rejecting connections localhost:5432 - accepting connections 2021-11-06 07:20:41,807 INFO: establishing a new patroni connection to the postgres cluster 2021-11-06 07:20:41,820 INFO: running post_bootstrap 2021-11-06 07:20:41,844 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds 2021-11-06 07:20:41,864 INFO: initialized a new cluster 2021-11-06 07:20:51,859 INFO: no action. I am (psql13n51) the leader with the lock 2021-11-06 07:20:51,880 INFO: no action. I am (psql13n51) the leader with the lock 2021-11-06 07:21:01,883 INFO: no action. I am (psql13n51) the leader with the lock 2021-11-06 07:21:11,877 INFO: no action. I am (psql13n51) the leader with the lock 2021-11-06 07:21:21,878 INFO: no action. I am (psql13n51) the leader with the lock 2021-11-06 07:21:31,877 INFO: no action. I am (psql13n51) the leader with the lock 2021-11-06 07:21:41,880 INFO: no action. I am (psql13n51) the leader with the lockNext, edit postgresql.yml file on psql13n52 node, and add the following configuration parameters.
Make sure, you change namespace, etcd host name, listen and connect_address:
cat /opt/app/patroni/etc/postgresql.yml scope: postgres name: psql13n52 restapi: listen: 0.0.0.0:8008 connect_address: 192.168.56.52:8008 etcd: host: psql13n52:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true use_slots: true parameters: wal_level: replica hot_standby: "on" logging_collector: "on" max_wal_senders: 5 max_replication_slots: 5 initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 trust - host replication replicator 192.168.56.1/24 md5 - host all all 192.168.56.1/24 md5 - host all all 0.0.0.0/0 md5 users: admin: password: admin options: - createrole - createdb postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.56.52:5432 data_dir: "/var/lib/pgsql/14/data" bin_dir: "/usr/pgsql-14/bin" pgpass: /tmp/pgpass authentication: replication: username: replicator password: vagrant superuser: username: postgres password: vagrant parameters: unix_socket_directories: '/var/run/postgresql' watchdog: mode: required device: /dev/watchdog safety_margin: 5 tags: nofailover: false noloadbalance: false clonefrom: false nosync: falseValidate configuration:
patroni --validate-config /opt/app/patroni/etc/postgresql.ymlRun as postgres user on psql13n52 node:
# sudo su - postgres patroni /opt/app/patroni/etc/postgresql.yml
2021-11-06 07:23:25,827 INFO: Selected new etcd server http://192.168.56.53:2379 2021-11-06 07:23:25,831 INFO: No PostgreSQL configuration items changed, nothing to reload. 2021-11-06 07:23:25,839 INFO: Lock owner: psql13n51; I am psql13n52 2021-11-06 07:23:25,843 INFO: trying to bootstrap from leader 'psql13n51' 2021-11-06 07:23:26,237 INFO: replica has been created using basebackup 2021-11-06 07:23:26,238 INFO: bootstrapped from leader 'psql13n51' 2021-11-06 07:23:26,398 INFO: postmaster pid=1506 localhost:5432 - no response 2021-11-06 07:23:26.435 UTC [1506] LOG: redirecting log output to logging collector process 2021-11-06 07:23:26.435 UTC [1506] HINT: Future log output will appear in directory "log". localhost:5432 - accepting connections localhost:5432 - accepting connections 2021-11-06 07:23:27,449 INFO: Lock owner: psql13n51; I am psql13n52 2021-11-06 07:23:27,449 INFO: establishing a new patroni connection to the postgres cluster 2021-11-06 07:23:27,478 INFO: no action. I am a secondary (psql13n52) and following a leader (psql13n51) 2021-11-06 07:23:31,874 INFO: no action. I am a secondary (psql13n52) and following a leader (psql13n51) 2021-11-06 07:23:41,879 INFO: no action. I am a secondary (psql13n52) and following a leader (psql13n51)Next, edit postgresql.yml file on psql13n53:
cat /opt/app/patroni/etc/postgresql.yml scope: postgres name: psql13n53 restapi: listen: 0.0.0.0:8008 connect_address: 192.168.56.53:8008 etcd: host: psql13n53:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true use_slots: true parameters: wal_level: replica hot_standby: "on" logging_collector: "on" max_wal_senders: 5 max_replication_slots: 5 initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 trust - host replication replicator 192.168.56.1/24 md5 - host all all 192.168.56.1/24 md5 - host all all 0.0.0.0/0 md5 users: admin: password: admin options: - createrole - createdb postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.56.53:5432 data_dir: "/var/lib/pgsql/14/data" bin_dir: "/usr/pgsql-14/bin" pgpass: /tmp/pgpass authentication: replication: username: replicator password: vagrant superuser: username: postgres password: vagrant parameters: unix_socket_directories: '/var/run/postgresql' watchdog: mode: required device: /dev/watchdog safety_margin: 5 tags: nofailover: false noloadbalance: false clonefrom: false nosync: falseValidate configuration:
patroni --validate-config /opt/app/patroni/etc/postgresql.ymlRun as postgres user:
# sudo su - postgres patroni /opt/app/patroni/etc/postgresql.yml
2021-11-06 07:25:26,664 INFO: Selected new etcd server http://192.168.56.53:2379 2021-11-06 07:25:26,667 INFO: No PostgreSQL configuration items changed, nothing to reload. 2021-11-06 07:25:26,673 INFO: Lock owner: psql13n51; I am psql13n53 2021-11-06 07:25:26,676 INFO: trying to bootstrap from leader 'psql13n51' 2021-11-06 07:25:27,102 INFO: replica has been created using basebackup 2021-11-06 07:25:27,102 INFO: bootstrapped from leader 'psql13n51' 2021-11-06 07:25:27,262 INFO: postmaster pid=1597 localhost:5432 - no response 2021-11-06 07:25:27.299 UTC [1597] LOG: redirecting log output to logging collector process 2021-11-06 07:25:27.299 UTC [1597] HINT: Future log output will appear in directory "log". localhost:5432 - accepting connections localhost:5432 - accepting connections 2021-11-06 07:25:28,312 INFO: Lock owner: psql13n51; I am psql13n53 2021-11-06 07:25:28,313 INFO: establishing a new patroni connection to the postgres cluster 2021-11-06 07:25:28,340 INFO: no action. I am a secondary (psql13n53) and following a leader (psql13n51) 2021-11-06 07:25:31,877 INFO: no action. I am a secondary (psql13n53) and following a leader (psql13n51)Check the state of the Patroni cluster:
# patronictl -c /opt/app/patroni/etc/postgresql.yml list + Cluster: postgres (7027353509639501631) ------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-----------+---------------+---------+---------+----+-----------+ | psql13n51 | 192.168.56.51 | Leader | running | 1 | | | psql13n52 | 192.168.56.52 | Replica | running | 1 | 0 | | psql13n53 | 192.168.56.53 | Replica | running | 1 | 0 | +-----------+---------------+---------+---------+----+-----------+Psql13n51 started the Patroni cluster so it was automatically made the leader – and thus the primary/master PostgreSQL server. Nodes psql13n52 and psql13n53 are configured as read replicas (as the hot_standby option was enabled in Patroni’s configuration file).
Check PostgreSQL configuration parameters:
patronictl -c /opt/app/patroni/etc/postgresql.yml show-config postgres loop_wait: 10 maximum_lag_on_failover: 1048576 postgresql: parameters: hot_standby: 'on' logging_collector: 'on' max_replication_slots: 5 max_wal_senders: 5 wal_level: replica use_pg_rewind: true use_slots: true retry_timeout: 10 ttl: 30With the configuration file in place, and now that we already have the etcd cluster up, all that is required is to restart the Patroni service:
Configure Patroni service on every node:
# vi /etc/systemd/system/patroni.service [Unit] Description=Runners to orchestrate a high-availability PostgreSQL After=syslog.target network.target etcd.target [Service] Type=simple User=postgres Group=postgres ExecStart=/usr/bin/patroni /opt/app/patroni/etc/postgresql.yml KillMode=process TimeoutSec=30 Restart=no [Install] WantedBy=multi-user.target
# systemctl status patroni # systemctl start patroni # systemctl enable patroni # systemctl status etcd # systemctl enable etcdReboot all 3 nodes:
# rebootCheck status of the Patroni service after reboot. Service should be up and running.
# systemctl status patroni ● patroni.service - Runners to orchestrate a high-availability PostgreSQL Loaded: loaded (/etc/systemd/system/patroni.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2021-11-05 23:33:22 UTC; 8h ago Main PID: 705 (patroni) Tasks: 14 (limit: 11401) Memory: 146.1M CGroup: /system.slice/patroni.service ├─ 705 /usr/bin/python3 /usr/bin/patroni /opt/app/patroni/etc/postgresql.yml ├─1278 /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data --config-file=/var/lib/pgsql/14/data/postgresql.conf --listen_addresses=0.0.0.0 --port=5432 --cluster_name=postgres --wal_level=replica --hot_standby=on --max_conn> ├─1280 postgres: postgres: logger ├─1282 postgres: postgres: checkpointer ├─1283 postgres: postgres: background writer ├─1284 postgres: postgres: stats collector ├─1359 postgres: postgres: postgres postgres 127.0.0.1(37558) idle ├─1371 postgres: postgres: walwriter ├─1372 postgres: postgres: autovacuum launcher └─1373 postgres: postgres: logical replication launcher Nov 06 07:43:00 psql13n51 patroni[705]: 2021-11-06 07:43:00,048 INFO: Software Watchdog activated with 25 second timeout, timing slack 15 seconds Nov 06 07:43:00 psql13n51 patroni[705]: 2021-11-06 07:43:00,058 INFO: promoted self to leader by acquiring session lock Nov 06 07:43:00 psql13n51 patroni[705]: server promoting Nov 06 07:43:00 psql13n51 patroni[705]: 2021-11-06 07:43:00,064 INFO: cleared rewind state after becoming the leadeCheck again state of the Petroni cluster:
# patronictl -c /opt/app/patroni/etc/postgresql.yml list + Cluster: postgres (7027353509639501631) ------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-----------+---------------+---------+---------+----+-----------+ | psql13n51 | 192.168.56.51 | Leader | running | 5 | | | psql13n52 | 192.168.56.52 | Replica | running | 5 | 0 | | psql13n53 | 192.168.56.53 | Replica | running | 5 | 0 | +-----------+---------------+---------+---------+----+-----------+
$ sudo systemctl enable patroni $ sudo systemctl start patroni $ sudo systemctl status patroniPostgresql is prohibited from the auto start, because it is managed by postgresql patroni.
Keepalived
Keepalived is used for IP failover between more servers.
Download latest Keepalived installation https://www.keepalived.org/download.html.
Run this on all 3 nodes:
wget https://www.keepalived.org/software/keepalived-2.2.4.tar.gzUnpack archive and configure:
# tar xvfz keepalived-2.2.4.tar.gz # cd keepalived-2.2.4 # ./configureFix errors befure running make command.
In my case I need to install openssl and libnl3 packages.
configure: error: !!! OpenSSL is not properly installed on your system. !!! !!! Can not include OpenSSL headers files. !!!
yum -y install openssl openssl-devel
# ./configure *** WARNING - this build will not support IPVS with IPv6. Please install libnl/libnl-3 dev libraries to support IPv6 with IPVS.
yum -y install libnl3 libnl3-devel
# ./configureWhen there is no error then run make and make install:
# make && make installFor Keepalived startup the same service is created on all servers.
# cat /etc/systemd/system/keepalived.service [Unit] Description=LVS and VRRP High Availability Monitor After=network-online.target syslog.target Wants=network-online.target [Service] Type=forking PIDFile=/run/keepalived.pid KillMode=process EnvironmentFile=-/usr/local/etc/sysconfig/keepalived ExecStart=/usr/local/sbin/keepalived $KEEPALIVED_OPTIONS ExecReload=/bin/kill -HUP $MAINPID [Install] WantedBy=multi-user.targetBefore starting keepalived, we have to configure a configuration file on all servers.
psql13n51:
# cat /etc/keepalived/keepalived.conf global_defs { } vrrp_script chk_haproxy { # Requires keepalived-1.1.13 script "killall -0 haproxy" # widely used idiom interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface eth0 state MASTER # or "BACKUP" on backup priority 101 # 101 on master, 100 on backup virtual_router_id 51 authentication { auth_type PASS auth_pass 1234 } virtual_ipaddress { 192.168.56.100 } track_script { chk_haproxy } }psql13n52 & psql13n53:
# cat /etc/keepalived/keepalived.conf global_defs { } vrrp_script chk_haproxy { # Requires keepalived-1.1.13 script "killall -0 haproxy" # widely used idiom interval 2 # check every 2 seconds weight 2 # add 2 points of prio if OK } vrrp_instance VI_1 { interface eth0 state BACKUP # or "BACKUP" on backup priority 100 # 101 on master, 100 on backup virtual_router_id 51 authentication { auth_type PASS auth_pass 1234 } virtual_ipaddress { 192.168.56.100 } track_script { chk_haproxy } }Now start keepalived service on all servers
# systemctl start keepalived # systemctl status keepalived # systemctl enable keepalivedThe VIP 192.168.56.100 should be run on one server and will automatically failover to the the second server if there is any issue with the server.
HAProxy
Instead of connecting directly to the database server, the application will be connecting to the proxy instead, which will forward the request to PostgreSQL. When HAproxy is used for this, it is also possible to route read requests to one or more replicas, for load balancing. With HAproxy, this is done by providing two different ports for the application to connect. We opted for the following setup:
Writes → 5000
Reads → 5001
HAproxy light-weight service and it can be installed as an independent server or, as in our case, on the database server.
sudo yum -y install haproxySet configuration on all nodes:
$ cat /etc/haproxy/haproxy.cfg global maxconn 100 defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s listen stats mode http bind *:7000 stats enable stats uri / listen primary bind *:5000 option httpchk OPTIONS /master http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server psql13n51 psql13n51:5432 maxconn 100 check port 8008 server psql13n52 psql13n52:5432 maxconn 100 check port 8008 server psql13n53 psql13n53:5432 maxconn 100 check port 8008 listen standbys balance roundrobin bind *:5001 option httpchk OPTIONS /replica http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server psql13n51 psql13n51:5432 maxconn 100 check port 8008 server psql13n52 psql13n52:5432 maxconn 100 check port 8008 server psql13n53 psql13n53:5432 maxconn 100 check port 8008Note there are two sections: primary, using port 5000, and standbys, using port 5001. All three nodes are included in both sections: that’s because they are all potential candidates to be either primary or secondary.
For HAproxy to know which role each node currently has, it will send an HTTP request to port 8008 of the node: Patroni will answer. Patroni provides a built-in REST API support for health check monitoring that integrates perfectly with HAproxy for this:
$ curl -s http://psql13n51:8008 {"state": "running", "postmaster_start_time": "2021-11-06 15:01:56.197081+00:00", "role": "replica", "server_version": 140000, "cluster_unlocked": false, "xlog": {"received_location": 83888920, "replayed_location": 83888920, "replayed_timestamp": null, "paused": false}, "timeline": 6, "database_system_identifier": "7027353509639501631", "patroni": {"version": "2.1.1", "scope": "postgres"}}Notice how we received "role": "replica" from Patroni when we sent HTTP request to standby server. HAProxy uses this information for query routing.
We configured the standbys group to balance read-requests in a round-robin fashion, so each connection request (or reconnection) will alternate between the available replicas.
Let’s start HAProxy on all three nodes:
# systemctl enable haproxy.service # systemctl start haproxy.service # systemctl status haproxy.serviceTest connections for master or standby connections:
sudo su - postgres echo "localhost:5000:postgres:postgres:vagrant" > ~/.pgpass echo "localhost:5001:postgres:postgres:vagrant" >> ~/.pgpass chmod 0600 ~/.pgpassWe can then execute two read-requests to verify the round-robin mechanism is working as intended:
$ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()" 192.168.56.51 $ psql -Upostgres -hlocalhost -p5001 -t -c "select inet_server_addr()" 192.168.56.52Master (read/write) connection:
$ psql -Upostgres -hlocalhost -p5000 -t -c "select inet_server_addr()" 192.168.56.53You can also check the state of HAproxy by visiting http://192.168.56.51:7000/ on your browser.
# patronictl -c /opt/app/patroni/etc/postgresql.yml list + Cluster: postgres (7027353509639501631) ------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-----------+---------------+---------+---------+----+-----------+ | psql13n51 | 192.168.56.51 | Replica | running | 7 | 0 | | psql13n52 | 192.168.56.52 | Replica | running | 7 | 0 | | psql13n53 | 192.168.56.53 | Leader | running | 7 | | +-----------+---------------+---------+---------+----+-----------+
We have set up three node Patroni cluster without no single point of failure (SPOF). In next part we will test disaster and failover scenarios on this configuration with active read/write workload.
Reference:
https://github.com/zalando/patroni
https://patroni.readthedocs.io/en/latest/
https://www.percona.com/blog/2021/06/11/postgresql-ha-with-patroni-your-turn-to-test-failure-scenarios/
https://blog.dbi-services.com/postgresql-high-availabilty-patroni-ectd-haproxy-keepalived/
https://digitalis.io/blog/technology/part1-postgresql-ha-patroni-etcd-haproxy/
Nice one...glad that you moved to open source solutions as Oracle seems to be dead end story...step that I have to do ASAP!
ReplyDeleteThanks! I am not moved completely to open source solutions but just extending knowledge a little. It is exciting to learn more about what is happening in the open source database world :-)
ReplyDeleteThanks!
ReplyDeleteThank you bro!!!
ReplyDeleteThank you very much for this precise and clear article!
ReplyDeleteI am thinking about a different readonly access. The primary should also be able to answer readonly access and standbys which lag far behind the primary should not be asked by haproxy. Any idea?
Hello!
DeleteIn current configuration Primary can accept read only and read write requests. It is not clear to me why Primary should accept requests on port dedicated for read only requests.
If you need replica without lag you could use synchronous replication or you can configure maximum lag threshold on the Patroni side.
I am not sure that you can add complexity on the HAProxy side for additional verifications when routing.
Best regards,
Marko
great article, thank you! are there some gotchas setting this up on debian/ubuntu?
ReplyDeleteHello!
DeleteThank you for the comment.
I haven't tested installation/configuration for debian/ubuntu.
Best regards,
Marko