首页 > 基础资料 博客日记
记一次项目上MySQL死锁Deadlock的排查优化过程
2025-06-26 11:00:02基础资料围观13次
起因是最近两天收到了线上项目的告警通知,隔一段时间会出现几笔MySQL的死锁Deadlock的错误,错误日志如下:
updating database. Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock;
try restarting transaction ### The error may exist in com/.../XXXMapper.java (best guess)
### The error may involve com.....MsgFrameSmsMapper.update-Inline
### The error occurred while setting parameters
### SQL: UPDATE msg SET success_count=success_count+1 WHERE (batch_name = ?)
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction ;
Deadlock found when trying to get lock;
try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock;
...
看了下日志位置以及执行的sql定位到了功能模块,其模块的功能概括如下图:
三方服务会给我们推送批次号以及状态的回调通知,一次通知中可能会包含多个批次号,每个批次号可能会产生多个子状态,每个子状态都需要通知到我们,我们根据子状态来决定来更新表中的数量字段。
库使用的引擎是InnoDB,batch_name加了索引,找DBA使用 SHOW ENGINE INNODB STATUS 命令查询了下mysql死锁的日志:
LATEST DETECTED DEADLOCK
------------------------
2025-05-27 10:00:29 0x7fbbb7ce7700
*** (1) TRANSACTION:
TRANSACTION 29608052, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 42 lock struct(s), heap size 8400, 47 row lock(s), undo log entries 75
MySQL thread id 11897, OS thread handle 140446306776832, query id 54085329 10.16.2.173 user_sms updating
UPDATE msg SET success_count=success_count+1 WHERE (batch_name = '1927183012550696960')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 188 page no 81 n bits 248 index ix_batch_name of table `ums_sms`.`msg` trx id 29608052 lock_mode X locks rec but not gap waiting
Record lock, heap no 98 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 19; hex 31393237313833303132353530363936393630; asc 1927183012550696960;;
1: len 8; hex 1abebabd41802001; asc A ;;
*** (2) TRANSACTION:
TRANSACTION 29608051, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
31 lock struct(s), heap size 3520, 44 row lock(s), undo log entries 79
MySQL thread id 11906, OS thread handle 140444219373312, query id 54085339 10.16.2.173 user_sms updating
UPDATE msg SET success_count=success_count+1 WHERE (batch_name = '1927183018338836480')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 188 page no 81 n bits 248 index ix_batch_name of table `ums_sms`.`msg` trx id 29608051 lock_mode X locks rec but not gap
Record lock, heap no 78 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 19; hex 31393237313832393939393830333731393638; asc 1927182999980371968;;
1: len 8; hex 1abebaba54402001; asc T@ ;;
Record lock, heap no 93 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 19; hex 31393237313833303038333934313435373932; asc 1927183008394145792;;
1: len 8; hex 1abebabc49802001; asc I ;;
Record lock, heap no 98 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 19; hex 31393237313833303132353530363936393630; asc 1927183012550696960;;
1: len 8; hex 1abebabd41802001; asc A ;;
Record lock, heap no 99 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 19; hex 31393237313833303132383639343638313630; asc 1927183012869468160;;
1: len 8; hex 1abebabd54401001; asc T@ ;;
Record lock, heap no 100 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 19; hex 31393237313833303133383637373038343136; asc 1927183013867708416;;
1: len 8; hex 1abebabd90001001; asc ;;
Record lock, heap no 141 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 19; hex 31393237313833303139323533313934373532; asc 1927183019253194752;;
1: len 8; hex 1abebabed0c02001; asc ;;
Record lock, heap no 157 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 19; hex 31393237313833303231303339393732333532; asc 1927183021039972352;;
1: len 8; hex 1abebabf3b802001; asc ; ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 188 page no 81 n bits 248 index ix_batch_name of table `ums_sms`.`msg` trx id 29608051 lock_mode X locks rec but not gap waiting
Record lock, heap no 129 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 19; hex 31393237313833303138333338383336343830; asc 1927183018338836480;;
1: len 8; hex 1abebabe9a802001; asc ;;
意思就是:
事务1 (29608052) 尝试执行sql语句:
UPDATE msg SET success_count=success_count+1 WHERE (batch_name = '1927183012550696960');
等待锁定的记录是batch_name = '1927183012550696960'
事务2(29608051) 执行了几个批次的update语句,其中包括:
UPDATE msg SET success_count=success_count+1 WHERE (batch_name = '1927183012550696960');
也就是事务2已经获得了batch_name = '1927183012550696960'的排它锁,然后等待锁定的记录是batch_name = '1927183018338836480',也就是尝试执行下面的sql语句:
UPDATE msg SET success_count=success_count+1 WHERE (batch_name = '1927183018338836480');
根据mysql的死锁日志可以推断出,事务1在等待batch_name='1927183012550696960'也就是被事务2执行的排它锁(堆号98),事务2在等待batch_name='1927183018338836480'的排它锁(堆号129),而这条记录被事务事务1(或其他事务,但根据日志上下文可推断为事务1)间接依赖或锁定,以此形成了循环依赖。
项目中会通过p6spy打印sql的执行日志,根据traceId看了下批次号1927183012550696960和1927183018338836480都存在于两条MQ消息中,说明这两个批次号都接收到了两次通知,这个三方服务本来只有一家,当时对接的时候约定同一个批次号下的子状态会聚合,不会出现一个批次号通知多次的场景,但是后面随着后面其它厂商的接入,对接的时候没有确定这一点,导致了一个批次号多次推送从而出现了上述的问题,并且这个问题也不是必现,正好要碰到这样的场景才会触发。
好在一条MQ消息中的某一个批次号执行失败,整个MQ消息的事务会回滚,然后其中一条MQ消息进入重试队列,重试之后就不会出现事务争抢的情况,因为MySQL会选择死锁中的一个事务执行成功,根据MySQL日志观察是事务2被回滚了(事务2回滚的日志没有贴出),事务1被提交。
毕竟有这些错误日志告警持续产生,得想办法解决了,首先想到的就是化数据库的锁为redis的锁了,也就是在执行某一个批次号的更新动作时,先使用redis获取这一个批次号的锁,获得锁之后进行操作,否则等待,不过这还牵扯出一个新的问题,就是一条消息里面的多个批次在处理时是一个完整的事务,只有等待所有批次处理完成后,才能释放掉这条消息中每一个批次的锁,初步方案如下:
方案不够好,还有优化的空间,想了下,可以借助MQ的唯一消息id和重试队列来实现,缩小事务范围,将一条消息中的批次号作为一个事务,将循环批次的代码放在事务外面,一个批次处理成功、事务提交成功之后通过redis记录,key是MQ的msgId+批次号,value按需设置,我这边存储的是批次号处理成功时候的时间戳,因为后面重试这条消息里面处理失败的批次号时,跳过处理这条消息里面成功的批次号是根据msgId+批次号去redis里查询的。
项目上使用的MQ是RocketMQ,RocketMQ从4.7.1版本开始,一条消息重试时消息的msgId会变化,但是可以通过读取消息中的属性UNIQ_KEY来获取原始的msgId,RocketMQ的MessageClientIDSetter类中也提供该方法:
public static String getUniqID(final Message msg) {
return msg.getProperty(MessageConst.PROPERTY_UNIQ_CLIENT_MESSAGE_ID_KEYIDX);
}
那么改版后的方案如下:
这个方案的好处就是一个事务里面不会牵扯到多个批次号,化大事物为小事物,也就避免了交叉获取批次号的锁从而发生死锁的问题,也不用使用redis对每一个批次号进行加锁,跨事务操作更新某一个批次号的数据时,其中一个事务会等待另一个事务提交之后执行。
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:jacktools123@163.com进行投诉反馈,一经查实,立即删除!
标签: