Showing posts with label MySQL. Show all posts
Showing posts with label MySQL. Show all posts

Wednesday, November 11, 2020

ProxySQL - Throttle for MySQL queries

ProxySQL is a great high availability and load balancing solution and it is mostly used for such purposes. But ProxySQL offers much more.

One of the nice features is the throttling mechanism for queries to the backends.

Imagine you have a very active system and applications are executing queries at very high rate, which is not so unusual nowadays. If just one of the queries slows down you could easily end up with many active sessions running the same query. Just one problematic query could cause high resource usage and general slowness.

Usually, DBA is called but DBA cannot modify a query, disable problematic application, or change database model without detailed analysis.

But ProxySQL could help.
Using ProxySQL we could delay execution of the problematic queries.
Yes, specific application request would still have a problem, but we would avoid general problem/downtime and "buy" some time for the fix.


Let's simulate such a situation in the test environment.

Run benchmark test using sysbench.

NUM_THREADS=1
TEST_DIR=/usr/share/sysbench/tests/include/oltp_legacy

sysbench \
  --test=${TEST_DIR}/oltp_simple.lua \
  --oltp-table-size=2000000 \
  --time=300 \
  --max-requests=0 \
  --mysql-table-engine=InnoDB \
  --mysql-user=sbtest \
  --mysql-password=sbtest \
  --mysql-port=3307 \
  --mysql-host=192.168.56.25 \
  --mysql-engine-trx=yes \
  --num-threads=$NUM_THREADS \
  prepare
  
sysbench \
  --test=${TEST_DIR}/oltp_simple.lua \
  --oltp-table-size=2000000 \
  --time=180 \
  --max-requests=0 \
  --mysql-table-engine=InnoDB \
  --mysql-user=sbtest \
  --mysql-password=sbtest \
  --mysql-port=3307 \
  --mysql-host=192.168.56.25 \
  --mysql-engine-trx=yes \
  --num-threads=$NUM_THREADS \
  run

Enable throttling mechanism and delay execution for all queries globally setting "mysql-default_query_delay=100".

  
ProxySQLServer> set mysql-default_query_delay=100;
ProxySQLServer> LOAD MYSQL VARIABLES TO RUNTIME; 
ProxySQLServer> SAVE MYSQL VARIABLES TO DISK;

Run test again and Check latency(ms).

  
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time

Initializing worker threads...

Threads started!

SQL statistics:
    queries performed:
        read:                            1774
        write:                           0
        other:                           0
        total:                           1774
    transactions:                        1774   (9.85 per sec.)
    queries:                             1774   (9.85 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          180.0942s
    total number of events:              1774

Latency (ms):
         min:                                  100.76 <<<<<<<<<<<<<<<<<
         avg:                                  101.51 <<<<< Throttling
         max:                                  129.17 <<<<<<<<<<<<<<<<<
         95th percentile:                      102.97
         sum:                               180083.66

Threads fairness:
    events (avg/stddev):           1774.0000/0.00
    execution time (avg/stddev):   180.0837/0.00

Disable throttling and reset ProxySQL counters.

    
ProxySQLServer> set mysql-default_query_delay=0;
ProxySQLServer> LOAD MYSQL VARIABLES TO RUNTIME; SAVE MYSQL VARIABLES TO DISK;
ProxySQLServer> select * from stats_mysql_query_digest_reset;

Check latency(ms).

  
Initializing worker threads...

Threads started!

SQL statistics:
    queries performed:
        read:                            641413
        write:                           0
        other:                           0
        total:                           641413
    transactions:                        641413 (3563.38 per sec.)
    queries:                             641413 (3563.38 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          180.0004s
    total number of events:              641413

Latency (ms):
         min:                                    0.19
         avg:                                    0.28
         max:                                   44.45
         95th percentile:                        0.43
         sum:                               179252.76

Threads fairness:
    events (avg/stddev):           641413.0000/0.00
    execution time (avg/stddev):   179.2528/0.00

Enable throttling for just a specific query using ProxySQL mysql query rules.

 
-- Find problematic query

ProxySQLServer> select hostgroup,username,count_star,
(count_star/(select Variable_Value from stats_mysql_global where Variable_Name='ProxySQL_Uptime')) 
as avg_per_sec, digest, digest_text from stats_mysql_query_digest order by count_star desc limit 10;

+-----------+----------+------------+-------------+--------------------+----------------------------------+
| hostgroup | username | count_star | avg_per_sec | digest             | digest_text                      |
+-----------+----------+------------+-------------+--------------------+----------------------------------+
| 2         | sbtest   | 641413     | 78          | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? |
+-----------+----------+------------+-------------+--------------------+----------------------------------+
1 row in set (0.00 sec)


-- Reset counters

ProxySQLServer> select * from stats_mysql_query_digest_reset;
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
| hostgroup | schemaname | username | client_address | digest             | digest_text                      | count_star | first_seen | last_seen  | sum_time  | min_time | max_time | sum_rows_affected | sum_rows_sent |
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
| 2         | sbtest     | sbtest   |                | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | 641413     | 1601934890 | 1601935070 | 153023170 | 159      | 44349    | 0                 | 214399        |
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
1 row in set (0.00 sec)

Insert mysql query rule and enable throttling just for a specific query.

   
ProxySQLServer> insert into mysql_query_rules(rule_id,active,digest,delay,apply) values (1,1,'0xBF001A0C13781C1D',100,1);
Query OK, 1 row affected (0.00 sec)

LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

Compare "min_time" between executions.

 
Initializing worker threads...

Threads started!

SQL statistics:
    queries performed:
        read:                            1773
        write:                           0
        other:                           0
        total:                           1773
    transactions:                        1773   (9.85 per sec.)
    queries:                             1773   (9.85 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          180.0325s
    total number of events:              1773

Latency (ms):
         min:                                  100.78 <<<<<<<<<<<<<<<<<
         avg:                                  101.53 <<<<< Throttling
         max:                                  104.77 <<<<<<<<<<<<<<<<<
         95th percentile:                      102.97
         sum:                               180021.34

Threads fairness:
    events (avg/stddev):           1773.0000/0.00
    execution time (avg/stddev):   180.0213/0.00

 
ProxySQLServer> select * from stats_mysql_query_digest_reset;
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
| hostgroup | schemaname | username | client_address | digest             | digest_text                      | count_star | first_seen | last_seen  | sum_time  | min_time | max_time | sum_rows_affected | sum_rows_sent |
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
| 2         | sbtest     | sbtest   |                | 0xBF001A0C13781C1D | SELECT c FROM sbtest1 WHERE id=? | 1773       | 1601935408 | 1601935588 | 179697522 | 100681   | 104195   | 0                 | 594           |
+-----------+------------+----------+----------------+--------------------+----------------------------------+------------+------------+------------+-----------+----------+----------+-------------------+---------------+
1 row in set (0.01 sec)

Wednesday, January 24, 2018

Galera Cluster Schema Changes, Row Based Replication and Data Inconsistency

Galera Cluster is a virtually synchronous multi-master replication plug-in. When using Galera Cluster application can write to any node and transactions are then applied to all serves via row-based replication events.

This is built-in Mysql row-based replication which supports replication with differing table definitions between Master and Slave.
So, when using row-based repplication source and target table do not have to be identical. A table on master can have more or fewer columns or use different data types.

But there are limitations you must watch over depending on MySQL version you are running.
- The database and table names must be the same on both Master and Slave
- Columns must be in the same order before any additional column
- Each extra column must have default value
- ...

Newer MySQL versions may tolerate more differences between source and target table - check documentation for your version.


I want to show you what could happen with your data if you do not pay attention on this limitations.


Suppose I have 3-node MariaDB Galera Cluster with table t.
I want to add several columns to the table while database is used by an application.

For such task I will use built-in Rolling Schema Change (RSU) method which enables me to perform schema changes on node without impact on rest of the cluster.

Add column c4 to the table t following rules above for row-based replication.

Table t has three columns and one row inserted.
NODE1

MariaDB [testdb]> create table t (c1 varchar(10), c2 varchar(10), c3 varchar(10));
Query OK, 0 rows affected (0.37 sec)

MariaDB [testdb]> insert into t values ('n1-1','n1-1','n1-1');
Query OK, 1 row affected (0.00 sec)


NODE2

MariaDB [testdb]> select * from t;
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
+------+------+------+
1 row in set (0.00 sec)


NODE3

MariaDB [testdb]> select * from t;
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
+------+------+------+
1 row in set (0.01 sec)

I will enable RSU mode which ensures that this server will not impact the rest of the cluster during ALTER command execution.

Add column c4 and INSERT row simulating application activity.
MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c4 varchar(10);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> insert into t(c1,c2,c3) values ('n1-1','n1-1','n1-1');
Query OK, 1 row affected (0.13 sec)

While table definition is different between Node1 and rest of the cluster INSERT few more rows on other nodes.

NODE2

insert into t(c1,c2,c3) values ('n2-1','n2-1','n2-1');


NODE3

insert into t(c1,c2,c3) values ('n3-1','n3-1','n3-1');

Check rows from table t.
NODE1

MariaDB [testdb]> select * from t;
+------+------+------+------+
| c1   | c2   | c3   | c4   |
+------+------+------+------+
| n1-1 | n1-1 | n1-1 | NULL |
| n1-1 | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
+------+------+------+------+
4 rows in set (0.00 sec)

NODE2

MariaDB [testdb]> select * from t;
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
| n1-1 | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 |
| n3-1 | n3-1 | n3-1 |
+------+------+------+
4 rows in set (0.00 sec)


NODE3

MariaDB [testdb]> select * from t;
+------+------+------+
| c1   | c2   | c3   |
+------+------+------+
| n1-1 | n1-1 | n1-1 |
| n1-1 | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 |
| n3-1 | n3-1 | n3-1 |
+------+------+------+
4 rows in set (0.01 sec)

As you can notice everything is OK with my data.

Add new column to Node2 and Node3 following the same steps as for Node1.

NODE2

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c4 varchar(10);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)


NODE3

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c4 varchar(10);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)

And my task is completed. I have successfully changed model of the table.


But what can happen if I add new column between existing columns.
Remember, this is not permitted for a row-based replication and can cause replication to brake or something even worse.

Enable RSU mode on Node1 and add new column c11 after c1 column.
INSERT row simulating active application during schema change.

NODE1

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]>
MariaDB [testdb]> alter table t add column c11 varchar(10) after c1;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> insert into t(c1,c2,c3) values ('n1-1','n1-1','n1-1');
Query OK, 1 row affected (0.01 sec)

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
+------+------+------+------+------+
5 rows in set (0.00 sec)

INSERT row on other nodes because Galera Cluster allows us write on any node in the cluster configuration.

NODE2

MariaDB [testdb]> insert into t(c1,c2,c3) values ('n2-1','n2-1','n2-1');
Query OK, 1 row affected (0.01 sec)

MariaDB [testdb]> select * from t;
+------+------+------+------+
| c1   | c2   | c3   | c4   |
+------+------+------+------+
| n1-1 | n1-1 | n1-1 | NULL |
| n1-1 | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 | NULL |
+------+------+------+------+
6 rows in set (0.00 sec)


NODE3

MariaDB [testdb]> insert into t(c1,c2,c3) values ('n3-1','n3-1','n3-1');
Query OK, 1 row affected (0.01 sec)

MariaDB [testdb]> select * from t;
+------+------+------+------+
| c1   | c2   | c3   | c4   |
+------+------+------+------+
| n1-1 | n1-1 | n1-1 | NULL |
| n1-1 | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 |
| n2-1 | n2-1 | n2-1 | NULL |
| n3-1 | n3-1 | n3-1 | NULL |
+------+------+------+------+
7 rows in set (0.00 sec)

INSERT commands were successfully executed and everything is OK with my replication.
I don't have any errors in error.log that suggests that I have any problem.

But check contest of table t on the first node where new column is added.

NODE1

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL | NULL |
| n3-1 | n3-1 | n3-1 | NULL | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)

Notice how rows differ between nodes, and we should have exactly the same data on all tree nodes.


Let's complete schema changes on other two nodes.
NODE2

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c11 varchar(10) after c1;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)


NODE3

MariaDB [testdb]> set session wsrep_OSU_method='RSU';
Query OK, 0 rows affected (0.00 sec)

MariaDB [testdb]> alter table t add column c11 varchar(10) after c1;
Query OK, 0 rows affected (0.34 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [testdb]> set session wsrep_OSU_method='TOI';
Query OK, 0 rows affected (0.00 sec)

I have successfully added new column, did not brake reapplication and everything seems OK, but my data is not consistent between nodes.

NODE1

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | n2-1 | n2-1 | NULL | NULL |
| n3-1 | n3-1 | n3-1 | NULL | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)


NODE2

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | NULL | n1-1 | n1-1 |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)


NODE3

MariaDB [testdb]> select * from t;
+------+------+------+------+------+
| c1   | c11  | c2   | c3   | c4   |
+------+------+------+------+------+
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n1-1 | NULL | n1-1 | n1-1 | NULL |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
| n1-1 | NULL | NULL | n1-1 | n1-1 |
| n2-1 | NULL | n2-1 | n2-1 | NULL |
| n3-1 | NULL | n3-1 | n3-1 | NULL |
+------+------+------+------+------+
7 rows in set (0.00 sec)


Data inconsistency is the worst problem that could happen in synchronous cluster configuration.
It could happen without any notice, but sooner or later it will stop reapplication process and failing node will be excluded from the cluster.





REFERENCE
https://dev.mysql.com/doc/refman/5.7/en/replication-features-differing-tables.html


Saturday, October 21, 2017

Beware of intensive slow query logging when using - log_queries_not_using_indexes

MySQL slow query log is great for identifying slow queries that are good candidates for optimisation. Slow query logging is disabled by default, but it is activated by DBA's or developers on most environments.

You can use slow query log to record all the traffic but be careful with this action as logging all traffic could be very I/O intensive and could have negative impact on general performance. It is recommended to record all traffic only for specific time periods.

This is why slow query logging is controlled with long_query_time parameter to log only slow queries.
But there is another parameter to think about - log_queries_not_using_indexes.

By default log_queries_not_using_indexes is disabled. If you have this parameter turned on you will log queries that don’t use an index, or that perform a full index scan where the index doesn't limit the number of rows - regardless of time taken.

If you have long_query_time configured to reasonable time, and still notice that queries are intensively logged in slow query log file, then you probably have enabled log_queries_not_using_indexes.

Enabling this parameter you’re practically saying that full scans are "evil" and should be considered for optimisation. But full scan doesn’t always mean that query is slow. In some situations query optimizer chooses full table scan as better option than index or you are probably querying very small table.


For instance, on several occasions I've noticed slow query logs flooded with queries like this:

# Time: 171021 17:51:45
# User@Host: monitor[monitor] @ localhost []
# Thread_id: 1492974  Schema:   QC_hit: No
# Query_time: 0.000321  Lock_time: 0.000072  Rows_sent: 0  Rows_examined: 1
# Full_scan: Yes  Full_join: No  Tmp_table: Yes  Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: No
SET timestamp=1508608305;
SELECT
      SCHEMA_NAME
    FROM information_schema.schemata
    WHERE SCHEMA_NAME NOT IN ('mysql', 'performance_schema', 'information_schema');

+------+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id   | select_type | table    | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+------+-------------+----------+------+---------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | schemata | ALL  | NULL          | NULL | NULL    | NULL | NULL | Using where |
+------+-------------+----------+------+---------------+------+---------+------+------+-------------+
Notice, Query_time: 0.000321.

Should I optimize query that is running 0.000321 secs with adding indexes. Probably not. But anyway, my log is flooded with this or similar queries.

I don’t see that parameter very useful and I would leave it on default value to avoid possible problems with intensive query logging.


Thursday, September 28, 2017

Delete large amounts of data on Galera Cluster using pt-archiver

Galera Cluster is excellent virtually synchronous multi-master database cluster. It has many benefits which you can check on GaleraCluster.
But beside benefits it has some limitations and one of them is handling large transactions.

Large replication data sets could degrade performance of whole cluster causing cluster freezing, increased memory consumption, crashing nodes, etc. To avoid this issues it is recommended to split large transactions into smaller chunks.

In this post I want to show you how to safely delete large amounts of data on Galera Cluster. You can perform this task using several tools or writing custom procedures to split large transaction into chunks. In this example I will use pt-archiver tool from Percona.


Imagine you have received task to perform data cleanup in devices table for several schemas.
It looks like very simple task - delete rows from devices table where device_cookie is 0.
delete from devices where device_cookie = 0

But, although statement looks simple it could potentially freeze whole cluster so before executing delete statement count how many rows you need to delete.

In my case I have to delete few millions of rows which is too much for one transaction so I need to split transaction into smaller chunks.

mysql> select count(*) from devices;
+----------+
| count(*) |
+----------+
|  2788504 |
+----------+

mysql> select count(*) - (select count(*) from devices where device_cookie = 0) 
       from devices;
+----------+
| count(*) |
+----------+
|      208 |
+----------+

I have to delete around 2.7 millions of rows.

This is command I will use:
pt-archiver --source h=localhost,u=marko,p="passwd",D=sch_testdb,t=devices \
--purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 1000

--purge - delete rows.
--where "device_cookie = 0" - filter rows you want to delete.
--sleep-coef 1.0 - throttle delete process to avoid pause signals from cluster.
--txn-size 1000 - this is chunk size for every transaction.

# time pt-archiver --source h=localhost,u=marko,p="passwd",D=sch_testdb,t=devices \
--purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 1000

real 3m32.532s
user 0m17.268s
sys 0m2.460s

Check after delete finished.
mysql> select count(*) from devices;
+----------+
| count(*) |
+----------+
|      208 |
+----------+
1 row in set (0.00 sec)

As I have to perform delete for several schemas, I have created simple shell script which iterates through schema list and executes pt-archiver command.

# cat delete_rows.sh
#!/bin/bash

LOGFILE=/opt/skripte/schema/table_delete_rows.log
SCHEMA_LIST=/opt/skripte/schema/schema_list.conf

# Get schema list and populate conf file
mysql -B -u marko -ppasswd --disable-column-names --execute "select schema_name from information_schema.schemata where schema_name like 'sch_%' and schema_name <> 'sch_sys'" > $SCHEMA_LIST

while IFS= read -r schema; do

  START=`date +%s`

  echo "`date`=> Deleting rows from table in schema: $schema"

  pt-archiver --source h=localhost,u=marko,p="passwd",D=$schema,t=devices --purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 500

  SPENT=$(((`date +%s` - $START) / 60))

  echo "`date`=> Finished deleting in schema - spent: $SPENT mins"
  echo "*************************************************************************"

done <$SCHEMA_LIST >> $LOGFILE

exit 0



Wednesday, October 7, 2015

Confusion and problems with lost+found directory in MySQL/Galera cluster configuration

The lost+found directory is filesystem directory created at root level of mounted drive for ext file systems. It is used by file system check tools (fsck) for file recoveries.

In MySql world it can cause confusion or possible problems with synchronisation in Galera cluster configuration.

Let’s check some examples.

I have MySQL database with datadir=/data in configuration file. I have deleted lost+found directory and restarted MySQL service.

When I list my databases this is result:
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| employees          |
| mysql              |
| performance_schema |
| pitrdb             |
| sbtest             |
| sys                |
| test               |
+--------------------+
8 rows in set (0.34 sec)

I will stop MySQL service and recreate lost+found directory.
$ sudo service mysql stop

$ cd /data
$ sudo mklost+found
mklost+found 1.42.9 (4-Feb-2014)

Restart service and show databases.
$ sudo service mysql start

mysql> show databases;
+---------------------+
| Database            |
+---------------------+
| information_schema  |
| employees           |
| #mysql50#lost+found |
| mysql               |
| performance_schema  |
| pitrdb              |
| sbtest              |
| sys                 |
| test                |
+---------------------+
9 rows in set (0.01 sec)

Notice database : #mysql50#lost+found

If you have dedicated entire FS to use as MySQL datadir then MySQL will interpret all files under that directory as db-related files.
SHOW DATABASES lists database lost+found which is not real database.

If you check error log you can notice this message:
[ERROR] Invalid (old?) table or database name 'lost+found'

For a single server configuration issues with lost+found directory can only make confusion. I’m not aware of any negative effects for database.
To avoid confusion you should move database to sub-directory below the root level directory. Also remove all directories that are not MySql db-related from datadir location.


Stop MySQL service on database server.
$ sudo service mysql stop

Make sub-directory and move existing data to new directory.
$ sudo su -
root@galera1:~# cd /data
root@galera1:/data# mkdir mydata && mv !(mydata) mydata
root@galera1:/data# chown -R mysql:mysql /data

Update configuration file with new datadir location.
# vi /etc/mysql/my.cnf
...
datadir=/data/mydata
...

Remove non-database directories.
# rm -rf mydata/lost+found
# mklost+found
mklost+found 1.42.9 (4-Feb-2014)

# pwd
/data
# ls -l
total 56
drwx------ 2 root  root  49152 Oct  4 16:48 lost+found
drwxr-xr-x 9 mysql mysql  4096 Oct  4 16:48 mydata

Restart the service.
$ sudo service mysql start


From 5.6 version you can tell server to ignore non-database directories using ignore-db-dir option.
$ sudo vi /etc/mysql/my.cnf
...
ignore-db-dir=lost+found
...



Let’s test how lost+found directory affects Galera cluster configuration.
For this test I’m using Percona XtraDB Cluster 5.6 with 3 nodes.

# dpkg -l | grep percona-xtradb-cluster-server
ii  percona-xtradb-cluster-server-5.6         5.6.25-25.12-1.trusty                               amd64        Percona XtraDB Cluster database server binaries


mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.6.25-73.1-56-log |
+--------------------+
1 row in set (0.00 sec)


mysql> show global status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.01 sec)

In this configuration for datadir is specified /data location with lost+found directory.
As this is 5.6 version I’ve included ignore-db-dir option in configuration file.

In SHOW DATABASES list and error log I don’t see any issues.
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| employees          |
| mysql              |
| performance_schema |
| pitrdb             |
| sbtest             |
| sys                |
| test               |
+--------------------+
8 rows in set (0.00 sec)

For SST method I’m using default and recommended Percona’s xtrabackup-v2.
So, what will happen if I initiate SST for one of the nodes in the cluster.

$ sudo service mysql stop
 * Stopping MySQL (Percona XtraDB Cluster) mysqld  [OK]

$ sudo rm /data/grastate.dat

$ sudo service mysql start
[sudo] password for marko:
 * Starting MySQL (Percona XtraDB Cluster) database server mysqld
* State transfer in progress, setting sleep higher mysqld
* The server quit without updating PID file (/data/galera2.pid).

It appears that SST failed with errors:

WSREP_SST: [ERROR] Cleanup after exit with status:1 (20151004 12:01:00.936)
2015-10-04 12:01:02 16136 [Note] WSREP: (cf98f684, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015-10-04 12:01:12 16136 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.56.102' --datadir '/data/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '16136' --binlog 'percona-bin' : 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-10-04 12:01:12 16136 [ERROR] WSREP: SST script aborted with error 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2015-10-04 12:01:12 16136 [ERROR] Aborting

2015-10-04 12:01:12 16136 [Warning] WSREP: 0.0 (galera3): State transfer to 1.0 (galera2) failed: -22 (Invalid argument)
2015-10-04 12:01:12 16136 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():731: Will never receive state. Need to abort.


The cause of SST failure is lost+found directory but in error log lost+found directory is not mentioned.

SST fails because xtrabackup ignores ignore-db-dir option and tries to synchronise lost+found directory which is owned by root user.


What will happen if I (for test) change the ownership of lost+found directory on donor nodes.

drwx------ 2 root  root      49152 Oct  4 11:50 lost+found

marko@galera3:/data# sudo chown -R mysql:mysql /data/lost+found
marko@galera1:/data$ sudo chown -R mysql:mysql /data/lost+found


marko@galera2:/data$ sudo service mysql start
 * Stale sst_in_progress file in datadir mysqld
* Starting MySQL (Percona XtraDB Cluster) database server mysqld
* State transfer in progress, setting sleep higher mysqld           [OK]

NODE2
...
drwxrwx--x  2 mysql mysql      4096 Oct  4 12:07 lost+found
...

SST succeeded and node is successfully joined/synced to the cluster.


To avoid this inconveniences just move databases from root directory.
Some of you will simply delete lost+found directory, but be aware, fsck may recreate lost+found directory and your cluster synchronisation will fail when you least expect it ;)


Thursday, February 5, 2015

MariaDB - Measure Replicaton Lag and Check / Fix Replication Inconsistencies using Percona tools

Percona Toolkit is collection of command-line tools to perform many MySQL tasks like creating backups, finding duplicate indexes, managing replication, etc.

In this post I will talk about how to measure replication lag and check/fix replication inconsistencies with this tools:
pt-heartbeat
pt-table-checksum
pt-table-sync


I am using environment from previous blog post.
Master-Master replication with MariaDB 10.0.16 database on Debian 7.


Install Percona Toolkit on both nodes:

$ sudo wget percona.com/get/percona-toolkit.deb

$ sudo apt-get install libterm-readkey-perl
$ sudo dpkg -i percona-toolkit.deb  


I will create percona database where I will store tables needed for various checks. Also I will create percona user which will be used with Percona tools.



MASTER1

MariaDB [(none)]> create database percona;

MariaDB [(none)]> grant all privileges on *.* to 'percona'@'master1.localdomain' identified by 'percona';

MariaDB [(none)]> grant all privileges on *.* to 'percona'@'localhost' identified by 'percona';

MariaDB [(none)]> flush privileges;


MASTER2

MariaDB [(none)]> grant all privileges on *.* to 'percona'@'master2.localdomain' identified by 'percona';

MariaDB [(none)]> grant all privileges on *.* to 'percona'@'localhost' identified by 'percona';

MariaDB [(none)]> flush privileges;




MONITOR REPLICATION LAG


So, I have replication running and I want to be sure that everything is working fine.
Typical method to monitor replication lag would be to run SLAVE STATUS and look at Seconds_Behind_Master. But Seconds_Behind_Master is not always accurate.

Percona Toolkit has a tool to monitor replication delay called pt-heartbeat.

We must create heartbeat table on the master manually or using --create-table option and heartbeat table must contain one heartbeat row. This table will be updated in interval we specify by pt-heartbeat. Slave will actively check table and calculate time delay.


Create heartbeat table and start daemonized process to update percona.heartbeat table.


MASTER1

$ pt-heartbeat -upercona -ppercona -D percona --update master1 --daemonize --create-table


MASTER2
Start pt-heartbeat.

$ pt-heartbeat -upercona -ppercona --update --database percona


MASTER1
Monitor replication slave lag.

$ pt-heartbeat -upercona -ppercona -D percona --monitor -h master2
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.01s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]
0.00s [  0.00s,  0.00s,  0.00s ]





CHECK REPLICATION INCONSISTENCIES


If we want to check replication integrity we can use pt-table-checksum tool.

Run tool on master server. It will automatically detect slave servers and connect to them to do some safety checks. After that it runs checksums on the tables of the master database and reports results in the checksum table. This results are then compared with the results on the slave whether the data differs.
You can inspect that table anytime - in this example percona.checksums table.

If there are no different rows in the tables between master and slave database DIFF will show 0.


$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master2
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T20:58:15      0      0        5       1       0   1.134 testdb.users


MASTER2

MariaDB [testdb]> create table address (id int auto_increment primary key, city varchar(30));
Query OK, 0 rows affected (0.06 sec)

MariaDB [testdb]> insert into address (city) values ('New York');
Query OK, 1 row affected (0.07 sec)

MariaDB [testdb]> insert into address (city) values ('LA');
Query OK, 1 row affected (0.06 sec)

MariaDB [testdb]> insert into address (city) values ('Zagreb');
Query OK, 1 row affected (0.13 sec)


MASTER1

$ pt-table-checksum -upercona -ppercona --replicate percona.checksums --databases testdb -h master2
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T20:59:16      0      0        3       1       0   1.032 testdb.address
02-02T20:59:17      0      0        5       1       0   1.120 testdb.users


$ pt-table-checksum -upercona -ppercona --replicate=percona.checksums --replicate-check-only --databases=testdb master1


Nothing received in output which means that testdb database is in sync with slave.


Insert some test data:

MariaDB [testdb]> create table animals (id int not null auto_increment,
    -> name char(30) not null,
    -> primary key(id));
Query OK, 0 rows affected (0.04 sec)

MariaDB [testdb]> insert into animals (name) values ('dog'),('cat'),('whale');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0


MariaDB [testdb]> create table countries (id int not null auto_increment,
    -> name varchar(30),
    -> primary key(id));
Query OK, 0 rows affected (0.09 sec)

MariaDB [testdb]> insert into countries(name) values ('Croatia'),('England'),('USA'),('Island');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB [testdb]> select * from animals;
+----+-------+
| id | name  |
+----+-------+
|  1 | dog   |
|  2 | cat   |
|  3 | whale |
+----+-------+
3 rows in set (0.00 sec)

MariaDB [testdb]>  select * from countries;
+----+---------+
| id | name    |
+----+---------+
|  1 | Croatia |
|  2 | England |
|  3 | USA     |
|  4 | Island  |
+----+---------+
4 rows in set (0.00 sec)


Check if database is in sync:

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T21:03:49      0      0        3       1       0   0.177 testdb.address
02-02T21:03:49      0      0        3       1       0   0.045 testdb.animals
02-02T21:03:49      0      0        4       1       0   0.049 testdb.countries
02-02T21:03:49      0      0        5       1       0   0.037 testdb.users





RESYNC REPLICA FROM THE MASTER


Lets make database on MASTER2 out-of-sync and create some differences between databases.


MASTER2

Instead of stopping replication process, I will temporarily disable binary logging on MASTER2 server.

MariaDB [testdb]> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)


Make same data modifications.

MariaDB [testdb]> insert into animals (name) values ('Ostrich'),('Penguin');
Query OK, 2 rows affected (0.04 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [testdb]> delete from countries where id=2;
Query OK, 1 row affected (0.01 sec)


MariaDB [testdb]> create table colors (name varchar(30));
Query OK, 0 rows affected (0.10 sec)

MariaDB [testdb]> insert into colors(name) values ('Red'),('Blue');
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0


Enable binary logging again.

MariaDB [testdb]> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)



MASTER1

MariaDB [testdb]> select * from animals;
+----+-------+
| id | name  |
+----+-------+
|  1 | dog   |
|  2 | cat   |
|  3 | whale |
+----+-------+
3 rows in set (0.00 sec)

MariaDB [testdb]> select * from countries;
+----+---------+
| id | name    |
+----+---------+
|  1 | Croatia |
|  2 | England |
|  3 | USA     |
|  4 | Island  |
+----+---------+
4 rows in set (0.00 sec)

MariaDB [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| address          |
| animals          |
| countries        |
| users            |
+------------------+
4 rows in set (0.00 sec)


MASTER2

MariaDB [testdb]> select * from animals;
+----+---------+
| id | name    |
+----+---------+
|  1 | dog     |
|  2 | cat     |
|  3 | whale   |
|  4 | Ostrich |
|  5 | Penguin |
+----+---------+
5 rows in set (0.00 sec)

MariaDB [testdb]> select * from countries;
+----+---------+
| id | name    |
+----+---------+
|  1 | Croatia |
|  3 | USA     |
|  4 | Island  |
+----+---------+
3 rows in set (0.00 sec)

MariaDB [testdb]> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| address          |
| animals          |
| colors           |
| countries        |
| users            |
+------------------+
5 rows in set (0.00 sec)


Notice that there are some inconsistencies between databases and there isn’t any built-in tool that will notify us about that. Replication is working fine, even though replica has different data than master.

With pt-table-checksum we will check data differences between databases.


MASTER1

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T21:11:23      0      0        3       1       0   0.106 testdb.address
02-02T21:11:23      0      1        3       1       0   0.053 testdb.animals
02-02T21:11:24      0      1        4       1       0   0.046 testdb.countries
02-02T21:11:24      0      0        5       1       0   0.042 testdb.users


$ pt-table-checksum -upercona -ppercona --replicate=percona.checksums --replicate-check-only --databases=testdb master1
Differences on master2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
testdb.animals 1 2 1
testdb.countries 1 -1 1

Notice how tool reported differences in DIFFS column.


Synchronizing data between servers in master-master configuration is not trivial task. You have to think about which process is changing data where and be very careful to avoid data corruption.

In master-master configuration data changes are replicated between nodes and statements executed on "slave" node are replicated to the master.

Maybe the best approach would be to stop replication, restore replica from backup or reclone whole server and start replication again. You can also dump only affected data with mysqldump and reload it.


As this is my testing environment I will try to resolve differences using pt-table-sync tool from Percona toolkit.


First I will use tool with --print option which will only display me queries that will resolve differences. I will inspect those queries before executing them on the slave server.
These queries could be executed manually also.

$ pt-table-sync -upercona -ppercona --sync-to-master --databases testdb --transaction --lock=1 --verbose master2 --print

# Syncing h=master2,p=...,u=percona
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 Chunk     22:13:17 22:13:17 0    testdb.address
DELETE FROM `testdb`.`animals` WHERE `id`='4' LIMIT 1 /*percona-toolkit src_db:testdb src_tbl:animals src_dsn:P=3306,h=master1,p=...,u=percona dst_db:testdb dst_tbl:animals dst_dsn:h=master2,p=...,u=percona lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:7723 user:msutic host:master1*/;
DELETE FROM `testdb`.`animals` WHERE `id`='5' LIMIT 1 /*percona-toolkit src_db:testdb src_tbl:animals src_dsn:P=3306,h=master1,p=...,u=percona dst_db:testdb dst_tbl:animals dst_dsn:h=master2,p=...,u=percona lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:7723 user:msutic host:master1*/;
#      2       0      0      0 Chunk     22:13:17 22:13:17 2    testdb.animals
REPLACE INTO `testdb`.`countries`(`id`, `name`) VALUES ('2', 'England') /*percona-toolkit src_db:testdb src_tbl:countries src_dsn:P=3306,h=master1,p=...,u=percona dst_db:testdb dst_tbl:countries dst_dsn:h=master2,p=...,u=percona lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:7723 user:msutic host:master1*/;
#      0       1      0      0 Chunk     22:13:17 22:13:17 2    testdb.countries
#      0       0      0      0 Chunk     22:13:17 22:13:17 0    testdb.users


Set --execute option to execute those queries.
With --sync-to-master option we will treat MASTER2 server as a slave.


$ pt-table-sync -upercona -ppercona --sync-to-master --databases testdb --transaction --lock=1 --verbose master2 --execute

# Syncing h=master2,p=...,u=percona
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 Chunk     22:19:51 22:19:51 0    testdb.address
#      2       0      0      0 Chunk     22:19:51 22:19:51 2    testdb.animals
#      0       1      0      0 Chunk     22:19:51 22:19:51 2    testdb.countries
#      0       0      0      0 Chunk     22:19:51 22:19:51 0    testdb.users


Output shows that differences are successfully resolved with two DELETE and one REPLACE operation on specified tables.

Let’s run another check to verify if differences still exist.

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1

            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T22:21:30      0      0        3       1       0   0.549 testdb.address
02-02T22:21:30      0      0        3       1       0   0.048 testdb.animals
02-02T22:21:30      0      0        4       1       0   0.043 testdb.countries
02-02T22:21:30      0      0        5       1       0   0.049 testdb.users

DIFFS columns shows only 0 which means that tables are in sync.




What if I run checksums on MASTER2 server.


MASTER2

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master2

            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T22:24:16      0      0        3       1       0   0.072 testdb.address
02-02T22:24:16      0      0        3       1       0   0.048 testdb.animals
02-02T22:24:16 Skipping table testdb.colors because it has problems on these replicas:
Table testdb.colors does not exist on replica master1
This can break replication.  If you understand the risks, specify --no-check-slave-tables to disable this check.
02-02T22:24:16 Error checksumming table testdb.colors: DBD::mysql::db selectrow_hashref failed: Table 'testdb.colors' doesn't exist [for Statement "EXPLAIN SELECT * FROM `testdb`.`colors` WHERE 1=1"] at /usr/bin/pt-table-checksum line 6595.

02-02T22:24:16      1      0        0       0       0   0.003 testdb.colors
02-02T22:24:16      0      0        4       1       0   0.044 testdb.countries
02-02T22:24:16      0      0        5       1       0   0.043 testdb.users


Output shows error because table testdb.colors exists on MASTER2 but not in MASTER1.

I know that MASTER1 has "correct" data so I will just drop testdb.colors table on MASTER2 node.

MariaDB [testdb]> drop table if exists testdb.colors;
Query OK, 0 rows affected (0.05 sec)


Run check again:

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master2
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-02T22:26:43      0      0        3       1       0   0.322 testdb.address
02-02T22:26:43      0      0        3       1       0   0.056 testdb.animals
02-02T22:26:43      0      0        4       1       0   0.050 testdb.countries
02-02T22:26:43      0      0        5       1       0   0.045 testdb.users


Now we have synced databases.



If we use --quiet option tool will print out row per table only if there are some differences. This is nice way to run tool from a cron job and send mail only if there is non-zero exit status.

$ pt-table-checksum -upercona -ppercona --create-replicate-table --replicate percona.checksums --databases testdb -h master1 --quiet
(no rows)




REFERENCES
http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html
http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html
http://www.percona.com/software/percona-toolkit


Sunday, February 1, 2015

MariaDB(MySQL) Master-Master Replication

The simplest and probably most common replication method is master-slave replication. Basically, data is replicated from master database to the slave. In case of master database failure you must get the slave database up-to-date before failover and then promote slave to be new master.

Another method is to set up replication in both directions called master-master replication. But you must be aware that this setup brings some potential issues as data changes are happening on both nodes. It could be problem if you have tables with auto_increment fields. If both servers are inserting or updating in the same table replication will break on one server due to “duplicate entry” error. To resolve this issue you have "auto_increment_increment" and "auto_increment_offset" settings.

In my case its best to use master-master setup as active-passive replication. If we know that only one node is performing data modifications we can avoid many possible problems. In case of the failover "slave" could be easily promoted to a new master. Data modifications are automatically replicated to failed node when it comes back up.

Of course, this simple setup is not suitable for all situations and it has it's drawbacks but luckily you have several other options at your disposal, like MariaDB Galera Cluster.



Servers setup:
OS: Debian 7.8
DB: MariaDB 10.0.16




Install MariaDB 10 (both nodes).

$ sudo apt-get install python-software-properties
$ sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 0xcbcb082a1bb943db
$ sudo add-apt-repository 'deb http://mirror3.layerjet.com/mariadb/repo/10.0/debian wheezy main'
$ sudo apt-get update
$ sudo apt-get install mariadb-server


Stop MariaDB on both nodes:
$ sudo service mysql stop


MASTER1

Edit /etc/mysql/my.cnf parameter file.

# bind-address          = 127.0.0.1
server-id               = 61
report_host             = master1
log_bin                 = /var/log/mysql/mariadb-bin
log_bin_index           = /var/log/mysql/mariadb-bin.index
relay_log               = /var/log/mysql/relay-bin
relay_log_index         = /var/log/mysql/relay-bin.index
# replicate-do-db       = testdb
auto_increment_increment = 5
auto_increment_offset = 1


# bind-address = 127.0.0.1
By default mysql will accept connections only from local host. We will comment this line to enable connections from other hosts. This is important for replication to work.

server-id = 61
report_host = master1
Choose ID that will uniquely identify your host. I will use last two digits of my IP address. Optionally you could set report_host parameter for servers to report each other their hostnames.

log_bin = /var/log/mysql/mariadb-bin
log_bin_index = /var/log/mysql/mariadb-bin.index
Enable binary logging.

relay_log = /var/log/mysql/relay-bin
relay_log_index = /var/log/mysql/relay-bin.index
Enable creating relay log files. Events that are read from master’s binary log are written to slave relay log.

replicate-do-db = testdb
With this parameter we are telling to MariaDB which databases to replicate. This parameter is optional.




Now we can start MariaDB server.

$ sudo service mysql start


Login as root and create user that will be used for replicating data between our servers. Grant appropriate privileges to the user.

$ sudo mysql -uroot -p
MariaDB [(none)]> create user 'replusr'@'%' identified by 'replusr';
MariaDB [(none)]> grant replication slave on *.* to 'replusr'@'%';


For the last step check status information about binary log files as we will use this information to start replication on another node.

MariaDB [(none)]> show master status;
+--------------------+----------+--------------+------------------+
| File               | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+--------------------+----------+--------------+------------------+
| mariadb-bin.000009 |      634 |              |                  |
+--------------------+----------+--------------+------------------+



MASTER2

Edit /etc/mysql/my.cnf parameter file.

# bind-address          = 127.0.0.1
server-id               = 62
report_host             = master2
log_bin                 = /var/log/mysql/mariadb-bin
log_bin_index           = /var/log/mysql/mariadb-bin.index
relay_log               = /var/log/mysql/relay-bin
relay_log_index         = /var/log/mysql/relay-bin.index
# replicate-do-db       = testdb
auto_increment_increment = 5
auto_increment_offset = 2


Start MariaDB server.

$ sudo service mysql start


Create user which will be used for replication and grant privileges to the user.

$ sudo mysql -uroot -p
MariaDB [(none)]> create user 'replusr'@'%' identified by 'replusr';
MariaDB [(none)]> grant replication slave on *.* to 'replusr'@'%';


To start replication enter following commands.

MariaDB [(none)]> STOP SLAVE;

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='master1', MASTER_USER='replusr',
-> MASTER_PASSWORD='replusr', MASTER_LOG_FILE='mariadb-bin.000009', MASTER_LOG_POS=634;

MariaDB [(none)]> START SLAVE;

For MASTER_LOG_FILE and MASTER_LOG_POS I have used information from "show master status" on the first node.

Check status information of the slave threads.

MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master1
                  Master_User: replusr
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000009
          Read_Master_Log_Pos: 634
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 537
        Relay_Master_Log_File: mariadb-bin.000009
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: testdb
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 634
              Relay_Log_Space: 828
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: No
                  Gtid_IO_Pos:


Notice that Read_Master_Log_Pos and Exec_Master_Log_Pos are in sync which is good indicator that our databases are in sync.


Check status information about binary log files of the MASTER2 node. We will need this information to start replication on MASTER1 node.

MariaDB [(none)]> show master status;
+--------------------+----------+--------------+------------------+
| File               | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+--------------------+----------+--------------+------------------+
| mariadb-bin.000009 |      759 |              |                  |
+--------------------+----------+--------------+------------------+



MASTER1

Start replicating data from MASTER2 to MASTER1 node.

MariaDB [(none)]> STOP SLAVE;

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='master2', MASTER_USER='replusr',
-> MASTER_PASSWORD='replusr', MASTER_LOG_FILE='mariadb-bin.000009', MASTER_LOG_POS=759;

MariaDB [(none)]> START SLAVE;


MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master2
                  Master_User: replusr
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000009
          Read_Master_Log_Pos: 759
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 537
        Relay_Master_Log_File: mariadb-bin.000009
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: testdb
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 759
              Relay_Log_Space: 828
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 62
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: No
                  Gtid_IO_Pos:


Everything seems to be OK.



Let’s create test table and insert some rows to test our replication.


MASTER1

MariaDB [(none)]> create database testdb;

MariaDB [(none)]> use testdb;
Database changed

MariaDB [testdb]> CREATE TABLE users (id INT AUTO_INCREMENT,
    -> name VARCHAR(30),
    -> datum TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    -> PRIMARY KEY(id));
Query OK, 0 rows affected (0.50 sec)

MariaDB [testdb]> INSERT INTO users(name) VALUES ('Marko');
Query OK, 1 row affected (0.06 sec)

MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name  | datum               |
+----+-------+---------------------+
|  1 | Marko | 2015-02-01 00:41:41 |
+----+-------+---------------------+
1 row in set (0.00 sec)


MASTER2

MariaDB [testdb]> use testdb
Database changed

MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name  | datum               |
+----+-------+---------------------+
|  1 | Marko | 2015-02-01 00:41:41 |
+----+-------+---------------------+
1 row in set (0.00 sec)

MariaDB [testdb]> INSERT INTO users(name) VALUES('John');
Query OK, 1 row affected (0.39 sec)

MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name  | datum               |
+----+-------+---------------------+
|  1 | Marko | 2015-02-01 00:41:41 |
|  2 | John  | 2015-01-31 16:17:55 |
+----+-------+---------------------+
2 rows in set (0.00 sec)


MASTER1

MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name  | datum               |
+----+-------+---------------------+
|  1 | Marko | 2015-02-01 00:41:41 |
|  2 | John  | 2015-01-31 16:17:55 |
+----+-------+---------------------+
2 rows in set (0.00 sec)

As we can see our table and rows are replicated successfully.



Let’s simulate crash of the MASTER1 node and power off the server.

$ sudo shutdown -h now

While server is down insert some rows on MASTER2 node.

MASTER2

MariaDB [testdb]> INSERT INTO users(name) VALUES ('Eric');
Query OK, 1 row affected (0.41 sec)

MariaDB [testdb]> INSERT INTO users(name) VALUES ('Clive');
Query OK, 1 row affected (0.08 sec)

MariaDB [testdb]> INSERT INTO users(name) VALUES ('Maria');
Query OK, 1 row affected (0.09 sec)

MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name  | datum               |
+----+-------+---------------------+
|  1 | Marko | 2015-02-01 00:41:41 |
|  2 | John  | 2015-01-31 16:17:55 |
|  3 | Eric  | 2015-01-31 16:19:49 |
|  4 | Clive | 2015-01-31 16:19:55 |
|  5 | Maria | 2015-01-31 16:20:01 |
+----+-------+---------------------+
5 rows in set (0.00 sec)


MariaDB [testdb]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: master1
                  Master_User: replusr
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000010
          Read_Master_Log_Pos: 1828
               Relay_Log_File: relay-bin.000012
                Relay_Log_Pos: 1083
        Relay_Master_Log_File: mariadb-bin.000010
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB: testdb
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1828
              Relay_Log_Space: 1663
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 2003
                Last_IO_Error: error reconnecting to master 'replusr@master1:3306' - retry-time: 
                               60  retries: 86400  message: Can't connect to MySQL server 
                               on 'master1' (111 "Connection refused")
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: No
                  Gtid_IO_Pos:

Check Last_IO_Error message while MASTER1 is down.


Now turn on MASTER1 node again.
MariaDB server and replication will start automatically and MASTER1 should catch up MASTER2.


MASTER1

Check "users" table - it's synchronised again.

$ mysql -u root -p -D testdb

MariaDB [testdb]> select * from users;
+----+-------+---------------------+
| id | name  | datum               |
+----+-------+---------------------+
|  1 | Marko | 2015-02-01 00:41:41 |
|  2 | John  | 2015-01-31 16:17:55 |
|  3 | Eric  | 2015-01-31 16:19:49 |
|  4 | Clive | 2015-01-31 16:19:55 |
|  5 | Maria | 2015-01-31 16:20:01 |
+----+-------+---------------------+
5 rows in set (0.00 sec)



Please let me know if you see possible problems in this configuration. I will update post gladly. Thanks for reading!