Showing posts with label Galera Cluster. Show all posts

Wednesday, January 24, 2018

Galera Cluster Schema Changes, Row Based Replication and Data Inconsistency

Galera Cluster is a virtually synchronous multi-master replication plug-in. When using Galera Cluster application can write to any node and transactions are then applied to all serves via row-based replication events.

This is built-in Mysql row-based replication which supports replication with differing table definitions between Master and Slave.
So, when using row-based repplication source and target table do not have to be identical. A table on master can have more or fewer columns or use different data types.

But there are limitations you must watch over depending on MySQL version you are running.
- The database and table names must be the same on both Master and Slave
- Columns must be in the same order before any additional column
- Each extra column must have default value
- ...

Newer MySQL versions may tolerate more differences between source and target table - check documentation for your version.

I want to show you what could happen with your data if you do not pay attention on this limitations.

Suppose I have 3-node MariaDB Galera Cluster with table t.
I want to add several columns to the table while database is used by an application.

For such task I will use built-in Rolling Schema Change (RSU) method which enables me to perform schema changes on node without impact on rest of the cluster.

Add column c4 to the table t following rules above for row-based replication.

Table t has three columns and one row inserted.

NODE1

MariaDB [testdb]> create table t (c1 varchar(10), c2 varchar(10), c3 varchar(10));
Query OK, 0 rows affected (0.37 sec)

MariaDB [testdb]> insert into t values ('n1-1','n1-1','n1-1');
Query OK, 1 row affected (0.00 sec)


NODE2

MariaDB [testdb]> select * from t;
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
+------+------+------+
1 row in set (0.00 sec)


NODE3

MariaDB [testdb]> select * from t;
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
+------+------+------+
1 row in set (0.01 sec)

I will enable RSU mode which ensures that this server will not impact the rest of the cluster during ALTER command execution.

Add column c4 and INSERT row simulating application activity.

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c4 varchar(10);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> insert into t(c1,c2,c3) values ('n1-1','n1-1','n1-1');
Query OK, 1 row affected (0.13 sec)

While table definition is different between Node1 and rest of the cluster INSERT few more rows on other nodes.

NODE2

insert into t(c1,c2,c3) values ('n2-1','n2-1','n2-1');


NODE3

insert into t(c1,c2,c3) values ('n3-1','n3-1','n3-1');

Check rows from table t.

NODE1

MariaDB [testdb]> select * from t;
+------+------+------+------+
| c1   | c2   | c3   | c4   |
+------+------+------+------+
| n1-1 | n1-1 | n1-1 | NULL |
| n1-1 | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
+------+------+------+------+
4 rows in set (0.00 sec)

NODE2

MariaDB [testdb]> select * from t;
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
| n1-1 | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 |
| n3-1 | n3-1 | n3-1 |
+------+------+------+
4 rows in set (0.00 sec)


NODE3

MariaDB [testdb]> select * from t;
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
| n1-1 | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 |
| n3-1 | n3-1 | n3-1 |
+------+------+------+
4 rows in set (0.01 sec)

As you can notice everything is OK with my data.

Add new column to Node2 and Node3 following the same steps as for Node1.

NODE2

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c4 varchar(10);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)


NODE3

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c4 varchar(10);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)

And my task is completed. I have successfully changed model of the table.

But what can happen if I add new column between existing columns.
Remember, this is not permitted for a row-based replication and can cause replication to brake or something even worse.

Enable RSU mode on Node1 and add new column c11 after c1 column.
INSERT row simulating active application during schema change.

NODE1

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]>
MariaDB [testdb]> alter table t add column c11 varchar(10) after c1;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> insert into t(c1,c2,c3) values ('n1-1','n1-1','n1-1');
Query OK, 1 row affected (0.01 sec)

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
+------+------+------+------+------+
5 rows in set (0.00 sec)

INSERT row on other nodes because Galera Cluster allows us write on any node in the cluster configuration.

NODE2

MariaDB [testdb]> insert into t(c1,c2,c3) values ('n2-1','n2-1','n2-1');
Query OK, 1 row affected (0.01 sec)

MariaDB [testdb]> select * from t;
+------+------+------+------+
| c1   | c2   | c3   | c4   |
+------+------+------+------+
| n1-1 | n1-1 | n1-1 | NULL |
| n1-1 | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 | NULL |
+------+------+------+------+
6 rows in set (0.00 sec)


NODE3

MariaDB [testdb]> insert into t(c1,c2,c3) values ('n3-1','n3-1','n3-1');
Query OK, 1 row affected (0.01 sec)

MariaDB [testdb]> select * from t;
+------+------+------+------+
| c1   | c2   | c3   | c4   |
+------+------+------+------+
| n1-1 | n1-1 | n1-1 | NULL |
| n1-1 | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
+------+------+------+------+
7 rows in set (0.00 sec)

INSERT commands were successfully executed and everything is OK with my replication.
I don't have any errors in error.log that suggests that I have any problem.

But check contest of table t on the first node where new column is added.

NODE1

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL | NULL |
| n3-1 | n3-1 | n3-1 | NULL | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)

Notice how rows differ between nodes, and we should have exactly the same data on all tree nodes.

Let's complete schema changes on other two nodes.

NODE2

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c11 varchar(10) after c1;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)


NODE3

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c11 varchar(10) after c1;
Query OK, 0 rows affected (0.34 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)

I have successfully added new column, did not brake reapplication and everything seems OK, but my data is not consistent between nodes.

NODE1

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL | NULL |
| n3-1 | n3-1 | n3-1 | NULL | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)


NODE2

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | NULL | n1-1 | n1-1 |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)


NODE3

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | NULL | n1-1 | n1-1 |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)

Data inconsistency is the worst problem that could happen in synchronous cluster configuration.
It could happen without any notice, but sooner or later it will stop reapplication process and failing node will be excluded from the cluster.

REFERENCE
https://dev.mysql.com/doc/refman/5.7/en/replication-features-differing-tables.html

Written January 24, 2018 by Marko Sutic

Galera Cluster, MySQL No comments

Tuesday, October 17, 2017

Enable SSL-encryption for MariaDB Galera Cluster

Imagine you have MariaDB Galera cluster with nodes running in different data centers. Data centers are not connected via secured VPN tunnel.
As database security is very important you must ensure that traffic between nodes is fully secured.

Galera Cluster supports encrypted connections between nodes using SSL protocol and in this post I want to show how to encrypt all cluster communication using SSL encryption.

Check current SSL configuration.

MariaDB [(none)]> SHOW VARIABLES LIKE 'have_ssl';
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| have_ssl      | DISABLED |  ###==> SSL Disabled
+---------------+----------+
1 row in set (0.01 sec)

MariaDB [(none)]> status
--------------
mysql  Ver 15.1 Distrib 10.0.29-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

Connection id:          56
Current database:
Current user:           marko@localhost
SSL:                    Not in use   ###==> SSL is not used
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server:                 MariaDB
Server version:         10.0.17-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4144
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /var/run/mysqld/mysqld.sock
Uptime:                 7 days 42 min 29 sec

Threads: 52  Questions: 10  Slow queries: 0  Opens: 0  Flush tables: 1  Open tables: 63  Queries per second avg: 0.000
--------------

SSL is currently disabled.

To fully secure all cluster communication we must SSL-encrypt replication traffic within Galera Cluster, State Snapshot Transfer and traffic between database server and client.

We will create SSL Certificates and Keys using openssl.

# Create new folder for certificates
mkdir -p /etc/mysql/ssl
cd /etc/mysql/ssl

# Create CA certificate
# Generate CA key
openssl genrsa 2048 > ca-key.pem

# Using the CA key, generate the CA certificate
openssl req -new -x509 -nodes -days 3600 \
> -key ca-key.pem -out ca-cert.pem
-----
Country Name (2 letter code) [AU]:HR
State or Province Name (full name) [Some-State]:Zagreb
Locality Name (eg, city) []:Zagreb
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Dummycorp
Organizational Unit Name (eg, section) []:IT
Common Name (e.g. server FQDN or YOUR name) []:myu1.localdomain
Email Address []:marko@dummycorp.com


# Create server certificate, remove passphrase, and sign it
# Create the server key
openssl req -newkey rsa:2048 -days 3600 \
>         -nodes -keyout server-key.pem -out server-req.pem
-----
Country Name (2 letter code) [AU]:HR
State or Province Name (full name) [Some-State]:Zagreb
Locality Name (eg, city) []:Zagreb
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Dummycorp
Organizational Unit Name (eg, section) []:IT
##==> Use the ".localdomain" only on the first certificate.
Common Name (e.g. server FQDN or YOUR name) []:myu1
Email Address []:marko@dummycorp.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:secretpassword
An optional company name []:

# Process the server RSA key
openssl rsa -in server-key.pem -out server-key.pem

# Sign the server certificate
openssl x509 -req -in server-req.pem -days 3600 \
>         -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out server-cert.pem


# Create client certificate, remove passphrase, and sign it
# Create the client key
openssl req -newkey rsa:2048 -days 3600 \
>         -nodes -keyout client-key.pem -out client-req.pem
-----
Country Name (2 letter code) [AU]:HR
State or Province Name (full name) [Some-State]:Zagreb
Locality Name (eg, city) []:Zagreb
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Dummycorp
Organizational Unit Name (eg, section) []:IT
Common Name (e.g. server FQDN or YOUR name) []:myu1
Email Address []:marko@dummycorp.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:secretpassword
An optional company name []:

# Process client RSA key
openssl rsa -in client-key.pem -out client-key.pem

# Sign the client certificate
openssl x509 -req -in client-req.pem -days 3600 \
>         -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out client-cert.pem


# Verify certificates

openssl verify -CAfile ca-cert.pem server-cert.pem client-cert.pem

server-cert.pem: OK
client-cert.pem: OK

If verification succeeds copy certificates to all nodes in the cluster.
Set mysql as owner of the files.

# Copy
scp -r /etc/mysql/ssl node1:/etc/mysql
scp -r /etc/mysql/ssl node2:/etc/mysql
scp -r /etc/mysql/ssl node3:/etc/mysql

# Change owner
node1: chown -R mysql:mysql /etc/mysql/ssl
node2: chown -R mysql:mysql /etc/mysql/ssl
node3: chown -R mysql:mysql /etc/mysql/ssl

Secure database and client connections.

Add following lines in my.cnf configuration file.

# MySQL Server
[mysqld]
ssl-ca=/etc/mysql/ssl/ca-cert.pem
ssl-cert=/etc/mysql/ssl/server-cert.pem
ssl-key=/etc/mysql/ssl/server-key.pem

# MySQL Client
[client]
ssl-ca=/etc/mysql/ssl/ca-cert.pem
ssl-cert=/etc/mysql/ssl/client-cert.pem
ssl-key=/etc/mysql/ssl/client-key.pem

Secure replication traffic.

Define paths to the key, certificate and certificate authority files. Galera Cluster will use this files for encrypting and decrypting replication traffic.

wsrep_provider_options="socket.ssl_key=/etc/mysql/ssl/server-key.pem;socket.ssl_cert=/etc/mysql/ssl/server-cert.pem;socket.ssl_ca=/etc/mysql/ssl/ca-cert.pem"

Enable SSL for mysqldump and Xtrabackup.

Create user which requires SSL for connection.

MariaDB [(none)]> CREATE USER 'sstssl'@'localhost' IDENTIFIED BY 'sstssl';
Query OK, 0 rows affected (0.03 sec)

MariaDB [(none)]> GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstssl'@'localhost' REQUIRE ssl;
Query OK, 0 rows affected (0.02 sec)

MariaDB [(none)]> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

I will use this user for replication.
Change wsrep_sst_auth in my.cnf configuration file.

wsrep_sst_auth="sstssl:sstssl"

Now we must recreate whole cluster.
If I restart only one node, while others are running, node won't join to existing cluster.
You can notice this errors in mysql error log.

171017  3:20:29 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.56.22:4567 failed: asio.ssl:336031996: 'unknown protocol' ( 336031996: 'error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol')
171017  3:20:29 [ERROR] WSREP: handshake with remote endpoint ssl://192.168.56.23:4567 failed: asio.ssl:336031996: 'unknown protocol' ( 336031996: 'error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol')

Shutdown the cluster and bootstrap it.

Check.

MariaDB [(none)]> status
--------------
mysql  Ver 15.1 Distrib 10.0.29-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

Connection id:          87
Current database:
Current user:           marko@localhost
SSL:                    Cipher in use is DHE-RSA-AES256-SHA  ###==> SSL is used
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server:                 MariaDB
Server version:         10.0.17-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4144
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /var/run/mysqld/mysqld.sock
Uptime:                 1 min 4 sec

Threads: 52  Questions: 676  Slow queries: 16  Opens: 167  Flush tables: 1  Open tables: 31  Queries per second avg: 10.562
--------------


MariaDB [(none)]> SHOW VARIABLES LIKE 'have_ssl';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| have_ssl      | YES   |
+---------------+-------+
1 row in set (0.01 sec)

REFERENCES
6.4.3.2 Creating SSL Certificates and Keys Using openssl
MySQL : Configure SSL Connections

Written October 17, 2017 by Marko Sutic

Galera Cluster 3 comments

Thursday, September 28, 2017

Delete large amounts of data on Galera Cluster using pt-archiver

Galera Cluster is excellent virtually synchronous multi-master database cluster. It has many benefits which you can check on GaleraCluster.
But beside benefits it has some limitations and one of them is handling large transactions.

Large replication data sets could degrade performance of whole cluster causing cluster freezing, increased memory consumption, crashing nodes, etc. To avoid this issues it is recommended to split large transactions into smaller chunks.

In this post I want to show you how to safely delete large amounts of data on Galera Cluster. You can perform this task using several tools or writing custom procedures to split large transaction into chunks. In this example I will use pt-archiver tool from Percona.

Imagine you have received task to perform data cleanup in devices table for several schemas.
It looks like very simple task - delete rows from devices table where device_cookie is 0.

delete from devices where device_cookie = 0

But, although statement looks simple it could potentially freeze whole cluster so before executing delete statement count how many rows you need to delete.

In my case I have to delete few millions of rows which is too much for one transaction so I need to split transaction into smaller chunks.

mysql> select count(*) from devices;
+----------+
| count(*) |
+----------+
|  2788504 |
+----------+

mysql> select count(*) - (select count(*) from devices where device_cookie = 0) 
       from devices;
+----------+
| count(*) |
+----------+
|      208 |
+----------+

I have to delete around 2.7 millions of rows.

This is command I will use:

pt-archiver --source h=localhost,u=marko,p="passwd",D=sch_testdb,t=devices \
--purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 1000

--purge - delete rows.
--where "device_cookie = 0" - filter rows you want to delete.
--sleep-coef 1.0 - throttle delete process to avoid pause signals from cluster.
--txn-size 1000 - this is chunk size for every transaction.

# time pt-archiver --source h=localhost,u=marko,p="passwd",D=sch_testdb,t=devices \
--purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 1000

real 3m32.532s
user 0m17.268s
sys 0m2.460s

Check after delete finished.

mysql> select count(*) from devices;
+----------+
| count(*) |
+----------+
|      208 |
+----------+
1 row in set (0.00 sec)

As I have to perform delete for several schemas, I have created simple shell script which iterates through schema list and executes pt-archiver command.

# cat delete_rows.sh
#!/bin/bash

LOGFILE=/opt/skripte/schema/table_delete_rows.log
SCHEMA_LIST=/opt/skripte/schema/schema_list.conf

# Get schema list and populate conf file
mysql -B -u marko -ppasswd --disable-column-names --execute "select schema_name from information_schema.schemata where schema_name like 'sch_%' and schema_name <> 'sch_sys'" > $SCHEMA_LIST

while IFS= read -r schema; do

  START=`date +%s`

  echo "`date`=> Deleting rows from table in schema: $schema"

  pt-archiver --source h=localhost,u=marko,p="passwd",D=$schema,t=devices --purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 500

  SPENT=$(((`date +%s` - $START) / 60))

  echo "`date`=> Finished deleting in schema - spent: $SPENT mins"
  echo "*************************************************************************"

done <$SCHEMA_LIST >> $LOGFILE

exit 0

Written September 28, 2017 by Marko Sutic

Galera Cluster, MySQL No comments

Wednesday, October 7, 2015

Confusion and problems with lost+found directory in MySQL/Galera cluster configuration

The lost+found directory is filesystem directory created at root level of mounted drive for ext file systems. It is used by file system check tools (fsck) for file recoveries.

In MySql world it can cause confusion or possible problems with synchronisation in Galera cluster configuration.

Let’s check some examples.

I have MySQL database with datadir=/data in configuration file. I have deleted lost+found directory and restarted MySQL service.

When I list my databases this is result:

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| employees          |
| mysql              |
| performance_schema |
| pitrdb             |
| sbtest             |
| sys                |
| test               |
+--------------------+
8 rows in set (0.34 sec)

I will stop MySQL service and recreate lost+found directory.

$ sudo service mysql stop

$ cd /data
$ sudo mklost+found
mklost+found 1.42.9 (4-Feb-2014)

Restart service and show databases.

$ sudo service mysql start

mysql> show databases;
+---------------------+
| Database            |
+---------------------+
| information_schema  |
| employees           |
| #mysql50#lost+found |
| mysql               |
| performance_schema  |
| pitrdb              |
| sbtest              |
| sys                 |
| test                |
+---------------------+
9 rows in set (0.01 sec)

Notice database : #mysql50#lost+found

If you have dedicated entire FS to use as MySQL datadir then MySQL will interpret all files under that directory as db-related files.
SHOW DATABASES lists database lost+found which is not real database.

If you check error log you can notice this message:

[ERROR] Invalid (old?) table or database name 'lost+found'

For a single server configuration issues with lost+found directory can only make confusion. I’m not aware of any negative effects for database.
To avoid confusion you should move database to sub-directory below the root level directory. Also remove all directories that are not MySql db-related from datadir location.

Stop MySQL service on database server.

$ sudo service mysql stop

Make sub-directory and move existing data to new directory.

$ sudo su -
root@galera1:~# cd /data
root@galera1:/data# mkdir mydata && mv !(mydata) mydata
root@galera1:/data# chown -R mysql:mysql /data

Update configuration file with new datadir location.

# vi /etc/mysql/my.cnf
...
datadir=/data/mydata
...

Remove non-database directories.

# rm -rf mydata/lost+found
# mklost+found
mklost+found 1.42.9 (4-Feb-2014)

# pwd
/data
# ls -l
total 56
drwx------ 2 root  root  49152 Oct  4 16:48 lost+found
drwxr-xr-x 9 mysql mysql  4096 Oct  4 16:48 mydata

Restart the service.

$ sudo service mysql start

From 5.6 version you can tell server to ignore non-database directories using ignore-db-dir option.

$ sudo vi /etc/mysql/my.cnf
...
ignore-db-dir=lost+found
...

Let’s test how lost+found directory affects Galera cluster configuration.
For this test I’m using Percona XtraDB Cluster 5.6 with 3 nodes.

# dpkg -l | grep percona-xtradb-cluster-server
ii  percona-xtradb-cluster-server-5.6         5.6.25-25.12-1.trusty                               amd64        Percona XtraDB Cluster database server binaries


mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.6.25-73.1-56-log |
+--------------------+
1 row in set (0.00 sec)


mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.01 sec)

In this configuration for datadir is specified /data location with lost+found directory.
As this is 5.6 version I’ve included ignore-db-dir option in configuration file.

In SHOW DATABASES list and error log I don’t see any issues.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| employees          |
| mysql              |
| performance_schema |
| pitrdb             |
| sbtest             |
| sys                |
| test               |
+--------------------+
8 rows in set (0.00 sec)

For SST method I’m using default and recommended Percona’s xtrabackup-v2.
So, what will happen if I initiate SST for one of the nodes in the cluster.

$ sudo service mysql stop
 * Stopping MySQL (Percona XtraDB Cluster) mysqld  [OK]

$ sudo rm /data/grastate.dat

$ sudo service mysql start
[sudo] password for marko:
 * Starting MySQL (Percona XtraDB Cluster) database server mysqld
* State transfer in progress, setting sleep higher mysqld
* The server quit without updating PID file (/data/galera2.pid).

It appears that SST failed with errors:

WSREP_SST: [ERROR] Cleanup after exit with status:1 (20151004 12:01:00.936)
2015-10-04 12:01:02 16136 [Note] WSREP: (cf98f684, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015-10-04 12:01:12 16136 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.56.102' --datadir '/data/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '16136' --binlog 'percona-bin' : 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-10-04 12:01:12 16136 [ERROR] WSREP: SST script aborted with error 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] Aborting

2015-10-04 12:01:12 16136 [Warning] WSREP: 0.0 (galera3): State transfer to 1.0 (galera2) failed: -22 (Invalid argument)
2015-10-04 12:01:12 16136 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.

The cause of SST failure is lost+found directory but in error log lost+found directory is not mentioned.

SST fails because xtrabackup ignores ignore-db-dir option and tries to synchronise lost+found directory which is owned by root user.

What will happen if I (for test) change the ownership of lost+found directory on donor nodes.

drwx------ 2 root  root      49152 Oct  4 11:50 lost+found

marko@galera3:/data# sudo chown -R mysql:mysql /data/lost+found
marko@galera1:/data$ sudo chown -R mysql:mysql /data/lost+found


marko@galera2:/data$ sudo service mysql start
 * Stale sst_in_progress file in datadir mysqld
* Starting MySQL (Percona XtraDB Cluster) database server mysqld
* State transfer in progress, setting sleep higher mysqld           [OK]

NODE2
...
drwxrwx--x  2 mysql mysql      4096 Oct  4 12:07 lost+found
...

SST succeeded and node is successfully joined/synced to the cluster.

To avoid this inconveniences just move databases from root directory.
Some of you will simply delete lost+found directory, but be aware, fsck may recreate lost+found directory and your cluster synchronisation will fail when you least expect it ;)

Written October 07, 2015 by Marko Sutic

Galera Cluster, MySQL No comments

Menu

Marko Sutic's Database Blog

Wednesday, January 24, 2018

Galera Cluster Schema Changes, Row Based Replication and Data Inconsistency

Tuesday, October 17, 2017

Enable SSL-encryption for MariaDB Galera Cluster

Thursday, September 28, 2017

Delete large amounts of data on Galera Cluster using pt-archiver

Wednesday, October 7, 2015

Confusion and problems with lost+found directory in MySQL/Galera cluster configuration

CATEGORIES

Blogroll

Recent Comments

About