视频1 视频21 视频41 视频61 视频文章1 视频文章21 视频文章41 视频文章61 推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37 推荐39 推荐41 推荐43 推荐45 推荐47 推荐49 关键词1 关键词101 关键词201 关键词301 关键词401 关键词501 关键词601 关键词701 关键词801 关键词901 关键词1001 关键词1101 关键词1201 关键词1301 关键词1401 关键词1501 关键词1601 关键词1701 关键词1801 关键词1901 视频扩展1 视频扩展6 视频扩展11 视频扩展16 文章1 文章201 文章401 文章601 文章801 文章1001 资讯1 资讯501 资讯1001 资讯1501 标签1 标签501 标签1001 关键词1 关键词501 关键词1001 关键词1501 专题2001
Erranttransactions:MajorhurdleforGTID-basedfailoverin_MySQL
2020-11-09 19:18:24 责编:小采
文档

I have previously written about thenew replication protocolthat comes with GTIDs in MySQL 5.6. Because of this new replication protocol, you can inadvertently create errant transactions that may turn any failover to a nightmare. Let’s see the problems and the potential solutions.

In short

  • Errant transactions may cause all kinds of data corruption/replication errors when failing over.
  • Detection of errant transactions can be done with theGTID_SUBSET()andGTID_SUBTRACT()functions.
  • If you find an errant transaction on one server, commit an empty transaction with the GTID of the errant one on all other servers.
  • If you are using a tool to perform the failover for you, make sure it can detect errant transactions. At the time of writing, onlymysqlfailoverandmysqlrpladminfrom MySQL Utilities can do that.
  • What are errant transactions?

    Simply stated, they are transactions executed directly on a slave. Thus they only exist on a specific slave. This could result from a mistake (the application wrote to a slave instead of writing to the master) or this could be by design (you need additional tables for reports).

    Why can they create problems that did not exist before GTIDs?

    Errant transactions have been existing forever. However because of the new replication protocol for GTID-based replication, they can have a significant impact on all servers if a slave holding an errant transaction is promoted as the new master.

    Compare what happens in this master-slave setup, first with position-based replication and then with GTID-based replication. A is the master, B is the slave:

    # POSITION-BASED REPLICATION# Creating an errant transaction on Bmysql> create database mydb;# Make B the master, and A the slave# What are the databases on A now?mysql> show databases like 'mydb';Empty set (0.01 sec)

    # POSITION-BASED REPLICATION

    # Creating an errant transaction on B

    mysql>createdatabasemydb;

    # Make B the master, and A the slave

    # What are the databases on A now?

    mysql>showdatabaseslike'mydb';

    Emptyset(0.01sec)

    As expected, the mydb database is not created on A.

    # GTID-BASED REPLICATION# Creating an errant transaction on Bmysql> create database mydb;# Make B the master, and A the slave# What are the databases on A now?mysql> show databases like 'mydb';+-----------------+| Database (mydb) |+-----------------+| mydb|+-----------------+

    # GTID-BASED REPLICATION

    # Creating an errant transaction on B

    mysql>createdatabasemydb;

    # Make B the master, and A the slave

    # What are the databases on A now?

    mysql>showdatabaseslike'mydb';

    +-----------------+

    |Database(mydb)|

    +-----------------+

    |mydb |

    +-----------------+

    mydb has been recreated on A because of the new replication protocol: when A connects to B, they exchange their own set of executed GTIDs and the master (B) sends any missing transaction. Here it is thecreate databasestatement.

    As you can see, the main issue with errant transactions is that when failing over you may execute transactions ‘coming from nowhere’ that can silently corrupt your data or break replication.

    How to detect them?

    If the master is running, it is quite easy with theGTID_SUBSET()function. As all writes should go to the master, the GTIDs executed on any slave should always be a subset of the GTIDs executed on the master. For instance:

    # Mastermysql> show master status/G*************************** 1. row *************************** File: mysql-bin.000017 Position: 376 Binlog_Do_DB: Binlog_Ignore_DB:Executed_Gtid_Set: 8e349184-bc14-11e3-8d4c-08002728ba:1-30,8e38e4-bc14-11e3-8d4c-08002728ba:1-7# Slavemysql> show slave status/G[...]Executed_Gtid_Set: 8e349184-bc14-11e3-8d4c-08002728ba:1-29,8e38e4-bc14-11e3-8d4c-08002728ba:1-9# Now, let's compare the 2 setsmysql> > select gtid_subset('8e349184-bc14-11e3-8d4c-08002728ba:1-29,8e38e4-bc14-11e3-8d4c-08002728ba:1-9','8e349184-bc14-11e3-8d4c-08002728ba:1-30,8e38e4-bc14-11e3-8d4c-08002728ba:1-7') as slave_is_subset;+-----------------+| slave_is_subset |+-----------------+| 0 |+-----------------+
    # Master

    mysql>showmasterstatus/G

    ***************************1.row***************************

    File:mysql-bin.000017

    Position:376

    Binlog_Do_DB:

    Binlog_Ignore_DB:

    Executed_Gtid_Set:8e349184-bc14-11e3-8d4c-08002728ba:1-30,

    8e38e4-bc14-11e3-8d4c-08002728ba:1-7

    # Slave

    mysql>showslavestatus/G

    [...]

    Executed_Gtid_Set:8e349184-bc14-11e3-8d4c-08002728ba:1-29,

    8e38e4-bc14-11e3-8d4c-08002728ba:1-9

    # Now, let's compare the 2 sets

    mysql>>selectgtid_subset('8e349184-bc14-11e3-8d4c-08002728ba:1-29,

    8e38e4-bc14-11e3-8d4c-08002728ba:1-9','8e349184-bc14-11e3-8d4c-08002728ba:1-30,

    8e38e4-bc14-11e3-8d4c-08002728ba:1-7')asslave_is_subset;

    +-----------------+

    |slave_is_subset|

    +-----------------+

    | 0|

    +-----------------+

    Hum, it looks like the slave has executed more transactions than the master, this indicates that the slave has executed at least 1 errant transaction. Could we know the GTID of these transactions? Sure, let’s useGTID_SUBTRACT():

    select gtid_subtract('8e349184-bc14-11e3-8d4c-08002728ba:1-29,8e38e4-bc14-11e3-8d4c-08002728ba:1-9','8e349184-bc14-11e3-8d4c-08002728ba:1-30,8e38e4-bc14-11e3-8d4c-08002728ba:1-7') as errant_transactions;+------------------------------------------+| errant_transactions|+------------------------------------------+| 8e38e4-bc14-11e3-8d4c-08002728ba:8-9 |+------------------------------------------+

    selectgtid_subtract('8e349184-bc14-11e3-8d4c-08002728ba:1-29,

    8e38e4-bc14-11e3-8d4c-08002728ba:1-9','8e349184-bc14-11e3-8d4c-08002728ba:1-30,

    8e38e4-bc14-11e3-8d4c-08002728ba:1-7')aserrant_transactions;

    +------------------------------------------+

    |errant_transactions |

    +------------------------------------------+

    |8e38e4-bc14-11e3-8d4c-08002728ba:8-9|

    +------------------------------------------+

    This means that the slave has 2 errant transactions.

    Now, how can we check errant transactions if the master is not running (like master has crashed, and we want to fail over to one of the slaves)? In this case, we will have to follow these steps:

  • Check all slaves to see if they have executed transactions that are not found on any other slave: this is the list of potential errant transactions.
  • Discard all transactions originating from the master: now you have the list of errant transactions of each slave
  • Some of you may wonder how you can know which transactions come from the master as it is not available:SHOW SLAVE STATUSgives you the master’s UUID which is used in the GTIDs of all transactions coming from the master.

    How to get rid of them?

    This is pretty easy, but it can be tedious if you have many slaves: just inject an empty transaction on all the other servers with the GTID of the errant transaction.

    For instance, if you have 3 servers, A (the master), B (slave with an errant transaction: XXX:3), and C (slave with 2 errant transactions: YYY:18-19), you will have to inject the following empty transactions in pseudo-code:

    # A- Inject empty trx(XXX:3)- Inject empty trx(YYY:18)- Inject empty trx(YYY:19)# B- Inject empty trx(YYY:18)- Inject empty trx(YYY:19)# C- Inject empty trx(XXX:3)
    # A

    -Injectemptytrx(XXX:3)

    -Injectemptytrx(YYY:18)

    -Injectemptytrx(YYY:19)

    # B

    -Injectemptytrx(YYY:18)

    -Injectemptytrx(YYY:19)

    # C

    -Injectemptytrx(XXX:3)

    Conclusion

    If you want to switch to GTID-based replication, make sure to check errant transactions before any planned or unplanned replication topology change. And be specifically careful if you use a tool that reconfigures replication for you: at the time of writing, onlymysqlrpladminandmysqlfailoverfrom MySQL Utilities can warn you if you are trying to perform an unsafe topology change.

    下载本文
    显示全文
    专题