Stoner’s World

Ravings of a keyboard cowboy

» Font Size «
Mar
18

Will Maatkit make me sane?

I downloaded Maatkit 1877 and installed it on a data archive server, then did a parallel dump of the data. I also did a regular dump of the same data using mysqldump. This isn’t a benchmark or even a real comparison, just me playing with Maatkit to get a feel for it and see how it can make my life easier. First, some stats about the database:

mysql> select table_name, table_rows from information_schema.tables where table_schema = 'somedb';
+------------+------------+
| table_name | table_rows |
+------------+------------+
| table1 | 24758808 |
| table2 | 33713412 |
| table3 | 221633443 |
| table4 | 56869734 |
| table5 | 14017081 |
| table6 | 77559278 |
| table7 | 48453337 |
+--------+------------+

mysql> call database_size_profile('somedb')\G
*************************** 1. row ***************************
myisam_index_size: 15603748864
 myisam_data_size: 14895580924
innodb_index_size: NULL
 innodb_data_size: NULL
 total_index_size: 15603748864
  total_data_size: 14895580924
  database_size: 30499329788
1 row in set (0.00 sec)

The dump using maatkit

$ mk-parallel-dump --tab --databases somedb --no-gzip
 default: 7 tables, 7 chunks, 7 successes, 0 failures, 1239.59 wall-clock time, 3558.11 dump time

$ ls -l default/somedb/
-rw-rw-rw- 1286 Mar 18 18:25 table1.000000.sql
-rw-rw-rw- 847081157 Mar 18 18:28 table1.000000.txt
-rw-rw-rw- 1274 Mar 18 18:25 table2.000000.sql
-rw-rw-rw- 1131602870 Mar 18 18:29 table2.000000.txt
-rw-rw-rw- 1304 Mar 18 18:25 table3.000000.sql
-rw-rw-rw- 8508494974 Mar 18 18:45 table3.000000.txt
-rw-rw-rw- 1318 Mar 18 18:25 table4.000000.sql
-rw-rw-rw- 2327666926 Mar 18 18:32 table4.000000.txt
-rw-rw-rw- 1213 Mar 18 18:28 table5.000000.sql
-rw-rw-rw- 441408946 Mar 18 18:29 table5.000000.txt
-rw-rw-rw- 1431 Mar 18 18:29 table6.000000.sql
-rw-rw-rw- 4001690723 Mar 18 18:45 table6.000000.txt
-rw-rw-rw- 1339 Mar 18 18:29 table7.000000.sql
-rw-rw-rw- 1625718296 Mar 18 18:35 table7.000000.txt

$ du -sh default/
18G default/

The dump using mysqldump.

$ mysqldump --user=joe -p --routines --trigger --quick --complete-insert --master-data=2 somedb > somedb.18mar08.sql

Elapsed time to dump: 2676

$ ls -l
-rw-r--r-- 21905573763 Mar 18 19:49 somedb.18mar08.sql

The server has two dual-core athlons and 8 gigs of ram, reading from a fiber-attached SAN and writing to local SCSI drives (RAID-1.) I haven’t performed a restore yet. That’s next but not tonight. I have a code push to deal with first.

Commnets

  1. Ma’atkit will definitely make you sane.

    Just check out the query profiler :) I love it!

  2. Do you know how mk-parallel-dump calculates the wall-clock time and dump time? I’m assuming the wall-clock time is the realtime one?

  3. awfief: The query analyzer does look like a lot of fun. As a managed service provider, I don’t have the (usually) tighter integration with the development teams that would allow me to review code and look for performance bottlenecks. I usually have to treat the app as a black box and examine everything else. Most of my clients have dbas so I’m relegated to plain ol’ sys admin duties. What’s sadder is, to date, I’ve only had one client actually make it a point to review the slow log on a regular basis. Oh well…you can lead a horse to water but can’t make them drink.

    Archon810: I haven’t looked at that. I do know you need to have Time::HiRes installed before installing Maatkit (it’s not identified as a dependency but mk-parallel-dump bombed because it wasn’t installed.)

  4. Well, if the wall time in your test is realtime, then mk-parallel-dump is faster, otherwise if dump time is realtime, then it’s slower. So which ran faster?

  5. Archon810: I looked at the mk-parallel-dump code and it only counts seconds using the system clock. This means it isn’t counting actual cpu time but wall clock time. The ‘dump time’ is actually the sum total of all the child processes while the ‘wall-clock time’ is the total time the script takes to complete (from hitting Enter to termination.)

    In my case, mk-parallel-dump was faster than mysqldump by 1400+ seconds, so roughly twice as fast.