视频1 视频21 视频41 视频61 视频文章1 视频文章21 视频文章41 视频文章61 推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37 推荐39 推荐41 推荐43 推荐45 推荐47 推荐49 关键词1 关键词101 关键词201 关键词301 关键词401 关键词501 关键词601 关键词701 关键词801 关键词901 关键词1001 关键词1101 关键词1201 关键词1301 关键词1401 关键词1501 关键词1601 关键词1701 关键词1801 关键词1901 视频扩展1 视频扩展6 视频扩展11 视频扩展16 文章1 文章201 文章401 文章601 文章801 文章1001 资讯1 资讯501 资讯1001 资讯1501 标签1 标签501 标签1001 关键词1 关键词501 关键词1001 关键词1501 专题2001
MySQLEnterpriseBackupImprovedCompressionAlgorithmfor3_MySQL
2020-11-09 19:18:46 责编:小采
文档


Background:

Prior to version 3.10, MySQL Enterprise Backup (MEB) used zlib compression for in-memory compression of datafiles. The compression worked by splitting the innodb datafiles into fixed size blocks and compressing each block independently.After searching on the web we found there are many compression algorithms available which can be used for compression. This triggered the idea of testing the performance of available compression algorithms. If the benchmark shows improved performance we can make backup and/or restore faster by adding the new compression algorithm to MEB.

Implementation :

The idea to implement the algorithms procceded as follows .

1. Select a "long list" of algorithms based on literature and what Google and other databases are using.
2. Create a prototype of MEB supporting the algorithms in the long list.
3. Run comparison tests of algorithms with the MEB prototype.
4. Select a "short list" of algorithms that will be added to MEB 3.10.

Criteria for Selecting the Algorithm:

The following criteria were used in comparing compression algorithms.

1. Compression speed
2. Decompression speed
3. Compression ratio
4. CPU-usage
5. Licensing model

These criteria have a differing importance. Compression speed, and compression ratio are probably more important to most users than decompression speed.

Performance Test:

We have now completed the performance tests of the new compression algorithms for MEB. See the below table for the list of compression algorithms were evaluated in the test.

Machine and OS Configurations:

OS : Oracle Linux 6 (x86_)
Memory: 29 GB RAM
Cpu : 8 vCPUs (2 quad-core processors, no HT)
Read speed of the source dir(data directory) : 600 MB/s
Write speed of the destination dir(backup directory) : 300 MB/s

A backup of a 441 GB database was generated using TPC-H datagen tool taken when the mysqld process was not running .

Compression AlgorithmsTime [min]Compr. size [GB]Compr. / Orig. sizeAvg. CPU usageAvg. CPU IdleReads [MB/s]Writes [MB/s]Source Disk busy
uncompressed/Normal Backup to Directory31N/A100%20%65%250250100%
Zlib (level=1)3416537%82%15%2209070%
Zlib (level=9)72012027%-----
LZF2722250%45%50%270140100%
LZO2722451%40%55%270140100%
Snappy3122150%55%40%26013080%
QuickLZ2620346%35%55%280120100%
LZ42621549%35%55%280130100%
LZMA (level=1)9011025%78%20%802225%
LZMA (level=9)3608820%-----

Few Important Notes:

Some columns are blanks because the test ran for longer duration of time so it was not feasible to collect monitoring stats.

“Source Disk busy" is the number of I/O per second in percent of what the device can execute. It is not related to the device throughput (MB/s).

MEB has an internal work queue to process data that is managed by separate read, process, and write threads. Read threads will place data in the process queue where processing threads then process it, and finally after the processing is complete, the data will be placed in the write queue where they will be written out to storage. Due to this design, if writes are slower than reads (which they often are), then the reads will effectively be throttled by the write speeds (write speeds typically being the limiting factor).

Analysis of the Compression Test's:

LZ4 and QuickLZ were the fastest algorithms, while ZLib (level=9) was by far the slowest. For compression ratios, LZMA (level=9) was only able to reach 20%, whereas QuickLZ reached 46%, and LZ4 49%. This illustrates the fact that there is a trade-off between backup speed and the reduction in data size. Nevertheless, we could say that algorithm A is better than algorithm B, if A is faster than B and produces a backup which is not larger than that of B, or if A produces a smaller backup than B and A is not slower than B. Using this criteria we can say that QuickLZ is a better compression algorithm than LZ4, Snappy, LZO, or LZF. Similarly, LZMA (level=1) is superior to Zlib (level=9). The summary table shows two limiting factors for the backup speed. The IO speed of the of disk on which the database resides (the source disk) is thelimiting factor for uncompressed backup and compressed backups made with LZF, LZO,QuickLZ and LZ4. For Zlib (level=1), Snappy and LZMA (level=1) the limiting factor is the CPU. After removing the worst performing algorithms, we have four remaining that we can organize into a line where you get higher speeds as you move to the left, and better compression as you move to the right.

BEST SPEED --- QuickLZ --LZ4------ Zlib (lev.=1) ---- LZMA (lev.=1)---- LZMA(lev=9) --- BEST COMPRESSION

Restore Speed:

The restore speed was almost the same for all the algorithms. The restore of uncompressed backup and ZLib compressed backup took 28 minutes, and for all the other algorithms the restore time was 29 minutes.

Conclusion:

For licensing reasons QuickLZ cannot be used with MEB. Therefore it was replaced with LZ4. Thus, the new compression algorithms are LZ4 (for fast compression) and LZMA (for high compression ratio).

下载本文
显示全文
专题