In this column I will clarify how to allocation a multi-terabyte size database among two MySQL occurrences using mydumper, which is a rational backup and restore tool that everything in corresponding. I will also cover the situation where you need to allocation only a subgroup of data.
Big tables are every so often difficult because of the volume of time needed to do mysqldump/restore which is sole threaded. It is not unusual for this type of procedure to take quite a few days.
From MySQL 5.6 and on, we also have portable tablespaces, but there are some boundaries with that method.
Physical backups (like xtrabackup) have the benefit to be quicker, but there are some situations where you still want a reasonable backup (transfer to Aurora/RDS/Google Cloud SQL).
The first item you will want to do is get mydumper installed. Newest form at the time of this inscription is 0.9.1 and you can get installation from here.
You can also automatically assemble as follows:
tar zxvf mydumper-0.6.2.tar.gz
Transfer data using mydumper
By default, mydumper will work by creation quite a lot of parallel developments (1 export process CPU) that will read one table each and write table contents to one file for each table. In this case, I’m transferring table1 and table2 tables:
./mydumper -t 8 -B database_name_innodb -T table1,table2
This is an development over old-style mysqldump, but we can still do improved. mydumper can split each table into portions (like 1000 rows) and write each portion to a distinct file, letting to parallelize the ingress later on. I have misplaced the -T argument here, so all tables from database_name_innodb will be transferred.
./mydumper -t 8 --rows=100000 -B database_name_innodb
If transferring Innodb tables only, it makes logic to use –trx-consistency-only so the implement uses less locking. You will silent get the binlog file required to seed a slave.
./mydumper -t 8 --rows=100000 --trx-consistency-only -B database_name_innodb
You can also require a regular expression to transfer only roughly databases, let’s say database1 and database2.
./mydumper -t 8 --rows=100000 --trx-consistency-only --regex '^(database1|database2)' -B database_name_innodb
Other choices include the capability to compress the transferred data on the fly, and also transfer triggers, code and events. To finish, I also mention the use of –verbose option for added prominence into what each thread is responsibility.
Here is an example of the whole command:
./mydumper -t 8 --rows=100000 --regex '^(database1|database2)' --compress --triggers --routines --events -v 3
--compress --trx-consistency-only -B database_name_innodb --outputdir /data/export –logfile /data/mydumper.log
While consecutively many threads, I saw some argument linked to adaptive hash index (this is on 5.5, I must tested if this occurs on other varieties as well).
Restricting AHI can have an influence on read queries, so if you can have enough money having the host out of creation while the transfer is running, I commend to disable AHI for the time being.
It is possibly a good impression to run the transfer on a slave, as you will be beating it hard with reads. Also, if you keep the slave sql thread still, transfer will be much quicker. I got up to 400% decrease in transfer time.
Ingress data using myloader
The load stage is the most sore part, usually taking way stretched than the time it took to transfer data.
If you run mydumper using the –rows choice as labelled above, numerous my loader threads can insert parallel on the same table, fastmoving up the process. Otherwise, you only get many tables bring in in parallel, which is supportive but decreases the paybacks if you have a few of huge tables and typically small tables.
Keep in attention though, when by means of a sole thread import rows can be injected in primary key climbing order, which enhances disk space. Running several insert threads on a sole table will cause row spreading to be less optimum, possibly using knowingly more disk space.
Other possible way to reduce the ingress time is for the moment relax uniformity by setting innodb_flush_log_at_trx_commit=0 sync_binlog=0. Also set query_cache_type=0 AND query_cache_size=0 to stop the query cache mutex from actuality used.
You can switch my loader operation size with queries-per-transaction constraint. Using the default value (1000) formed big transactions, I had well results by dropping this to 100.
The -o choice will drop the tables on the end point database if they previously exist.
Here is an sample of the command (rule of flick through in this case is have import_threads = cores / 2):
myloader --threads=4 -d /data/export/ -B db1 -q 100 -o -v 3
NOTE: my loader works by location sql-log-bin=0 for the import gathering by default, so make sure to prevail that (option is -e) if you have any slaves down the cable.
There is a divide of my loader that defers formation of subordinate indexes. The author proposes it is quicker for tables over 150 MM rows. I must had the time to test it and, inappropriately, it is based on the older 0.6 my loader. Do let me know in the remarks segment if you have tried it.
Importing into RDS
If you are consuming the Multi-AZ feature of RDS, that means inscribes are synchronously useful to another reserve RDS instance on a different accessibility zone.
This greatly surges write potential, which is a presentation killer for my loader. I guide to restrict this feature till the import is comprehensive.