But beside benefits it has some limitations and one of them is handling large transactions.
Large replication data sets could degrade performance of whole cluster causing cluster freezing, increased memory consumption, crashing nodes, etc. To avoid this issues it is recommended to split large transactions into smaller chunks.
In this post I want to show you how to safely delete large amounts of data on Galera Cluster. You can perform this task using several tools or writing custom procedures to split large transaction into chunks. In this example I will use pt-archiver tool from Percona.
Imagine you have received task to perform data cleanup in devices table for several schemas.
It looks like very simple task - delete rows from devices table where device_cookie is 0.
delete from devices where device_cookie = 0
But, although statement looks simple it could potentially freeze whole cluster so before executing delete statement count how many rows you need to delete.
In my case I have to delete few millions of rows which is too much for one transaction so I need to split transaction into smaller chunks.
mysql> select count(*) from devices; +----------+ | count(*) | +----------+ | 2788504 | +----------+ mysql> select count(*) - (select count(*) from devices where device_cookie = 0) from devices; +----------+ | count(*) | +----------+ | 208 | +----------+
I have to delete around 2.7 millions of rows.
This is command I will use:
pt-archiver --source h=localhost,u=marko,p="passwd",D=sch_testdb,t=devices \ --purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 1000
--purge - delete rows.
--where "device_cookie = 0" - filter rows you want to delete.
--sleep-coef 1.0 - throttle delete process to avoid pause signals from cluster.
--txn-size 1000 - this is chunk size for every transaction.
# time pt-archiver --source h=localhost,u=marko,p="passwd",D=sch_testdb,t=devices \ --purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 1000 real 3m32.532s user 0m17.268s sys 0m2.460s
Check after delete finished.
mysql> select count(*) from devices; +----------+ | count(*) | +----------+ | 208 | +----------+ 1 row in set (0.00 sec)
As I have to perform delete for several schemas, I have created simple shell script which iterates through schema list and executes pt-archiver command.
# cat delete_rows.sh #!/bin/bash LOGFILE=/opt/skripte/schema/table_delete_rows.log SCHEMA_LIST=/opt/skripte/schema/schema_list.conf # Get schema list and populate conf file mysql -B -u marko -ppasswd --disable-column-names --execute "select schema_name from information_schema.schemata where schema_name like 'sch_%' and schema_name <> 'sch_sys'" > $SCHEMA_LIST while IFS= read -r schema; do START=`date +%s` echo "`date`=> Deleting rows from table in schema: $schema" pt-archiver --source h=localhost,u=marko,p="passwd",D=$schema,t=devices --purge --where "device_cookie = 0" --sleep-coef 1.0 --txn-size 500 SPENT=$(((`date +%s` - $START) / 60)) echo "`date`=> Finished deleting in schema - spent: $SPENT mins" echo "*************************************************************************" done <$SCHEMA_LIST >> $LOGFILE exit 0
0 Comments:
Post a Comment