标签
mysql主从中断, 慢查询,慢查询太大,my_net_writ()
作者
Matthew
时间
2022-02-09
告警
出现告警!
触发告警名: 192.168.32.79:MySQL IO thread stopped
告警开始时间:2022.02.09 09:08:20
告警状态: PROBLEM
告警级别: 严重
监控项值:Slave IO thread running (06_slave_192.168.32.79:mysql.slave.Slave_IO_Running): Connecting
原始事件ID: 170558153
当前时间:2022.02.09 09:08:23
查看错误日志
数据库异常的时间。
022-02-09T08:42:12.598396+08:00 50408604 [Note] Aborted connection 50408604 to db: 'finenter' user: 'finenter' host: '192.168.50.56' (Got an error reading communication packets)
2022-02-09T01:07:34.069934Z mysqld_safe Number of processes running now: 0
2022-02-09T01:07:34.073206Z mysqld_safe mysqld restarted
2022-02-09T09:07:34.308228+08:00 0 [Warning] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged with strict mode in a future release.
2022-02-09T09:07:34.308361+08:00 0 [Note] --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled
2022-02-09T09:07:34.308395+08:00 0 [Note] /usr/local/mysql/bin/mysqld (mysqld 5.7.16-log) starting as process 22048 ..
这一段是mysql开始启动的sql
2022-02-09T09:07:37.104002+08:00 0 [Note] InnoDB: Opened 3 undo tablespaces
2022-02-09T09:07:37.104025+08:00 0 [Note] InnoDB: 3 undo tablespaces made active
2022-02-09T09:07:37.104252+08:00 0 [Note] InnoDB: Highest supported file format is Barracuda.
2022-02-09T09:07:37.303950+08:00 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 9352433176052
2022-02-09T09:07:37.304003+08:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 9352433181152
2022-02-09T09:07:37.501117+08:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 9352433181152
2022-02-09T09:07:37.506272+08:00 0 [Note] InnoDB: Database was not shutdown normally!
2022-02-09T09:07:37.506291+08:00 0 [Note] InnoDB: Starting crash recovery.
2022-02-09T09:07:37.816635+08:00 0 [Note] InnoDB: Transaction 11296772973 was in the XA prepared state.
2022-02-09T09:07:37.824610+08:00 0 [Note] InnoDB: Transaction 11296772973 was in the XA prepared state.
2022-02-09T09:07:37.828609+08:00 0 [Note] InnoDB: Transaction 11296772975 was in the XA prepared state.
2022-02-09T09:07:37.828632+08:00 0 [Note] InnoDB: Transaction 11296772975 was in the XA prepared state.
2022-02-09T09:07:37.828647+08:00 0 [Note] InnoDB: 2 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
数据库启动完成
2022-02-09T09:07:41.579857+08:00 0 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.7.16-log' socket: '/usr/local/mysql/mysql.sock' port: 3306 MySQL Community Server (GPL)
2022-02-09T09:07:44.746676+08:00 47 [Note] Aborted connection 47 to db: 'unconnected' user: 'app_finenter' host: '192.168.50.116' (Got an error reading communication packets)
2022-02-09T09:07:45.907252+08:00 75 [Note] Aborted connection 75 to db: 'unconnected' user: 'app_finenter' host: '192.168.50.115' (Got an error reading communication packets)
2022-02-09T09:08:34.038236+08:00 259 [Note] Start binlog_dump to master_thread_id(259) slave_server(32110), pos(, 4)
2022-02-09T09:08:34.039017+08:00 260 [Note] Start binlog_dump to master_thread_id(260) slave_server(3280), pos(, 4)
2022-02-09T09:08:34.042458+08:00 261 [Note] Start binlog_dump to master_thread_id(261) slave_server(3279), pos(, 4)
2022-02-09T09:08:34.046842+08:00 262 [Note] Start binlog_dump to master_thread_id(262) slave_server(3309), pos(, 4)
2022-02-09T09:08:34.077449+08:00 264 [Note] Start binlog_dump to master_thread_id(264) slave_server(3217), pos(, 4)
2022-02-09T09:08:34.083366+08:00 261 [Note] Aborted connection 261 to db: 'unconnected' user: 'repl' host: '192.168.32.79' (Failed on my_net_write())
2022-02-09T09:08:34.111652+08:00 263 [Note] Start binlog_dump to master_thread_id(263) slave_server(3331), pos(, 4)
2022-02-09T09:09:12.242692+08:00 0 [Note] InnoDB: Buffer pool(s) load completed at 220209 9:09:12
重启原因
用pt工具分析一个比较大的慢查询文件,资源不够用导致mysql异常重启了
从库报错
Last_IO_Error: Fatal error: Failed to run 'after_read_event' hook
分析源码
我们知道,mysqld是一个多线程的C/S架构的网络应用,因此少不了通过网络来读写数据,所以可能会出现写数据失败的情况。如果mysql的错误日志中出现此类错误,就说明是mysqld在向客户端发送网络包时失败导致的,当然,引申到复制场景,则说明是复制过程中,master向slave推送binlog时,写网络数据包失败。
查看master_dump_thread 逻辑
需要确定master在send binlog失败的情况下退出dump thread的逻辑,根据错误日志提示,进入到相关的代码查看。
错误代码在如下位置
inline int Binlog_sender::send_packet()
{
DBUG_ENTER("Binlog_sender::send_packet");
DBUG_PRINT("info",
("Sending event of type %s", Log_event::get_type_str(
(Log_event_type)m_packet.ptr()[1 + EVENT_TYPE_OFFSET])));
// We should always use the same buffer to guarantee that the reallocation
// logic is not broken.
if (DBUG_EVALUATE_IF("simulate_send_error", true,
my_net_write(
m_thd->get_protocol_classic()->get_net(),
(uchar*) m_packet.ptr(), m_packet.length())))
{
set_unknow_error("Failed on my_net_write()");
DBUG_RETURN(1);
}
调用关系
(gdb) bt
#0 Binlog_sender::send_packet (this=0x7fea741655d0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1158
#1 0x000000000190f74e in Binlog_sender::send_packet_and_flush (this=0x7fea741655d0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1182
#2 0x000000000190e181 in Binlog_sender::send_heartbeat_event (this=0x7fea741655d0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1143
#3 0x000000000190ee01 in Binlog_sender::wait_with_heartbeat (this=0x7fea741655d0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:633
#4 0x000000000190ecd7 in Binlog_sender::wait_new_events (this=0x7fea741655d0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:599
#5 0x000000000190e938 in Binlog_sender::get_binlog_end_pos (this=0x7fea741655d0, log_cache=0x7fea74165020) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:365
#6 0x000000000190c5e0 in Binlog_sender::send_binlog (this=0x7fea741655d0, log_cache=0x7fea74165020, start_pos=123) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:313
#7 0x000000000190c1b4 in Binlog_sender::run (this=0x7fea741655d0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:225
结果层层返回到Binlog_sender::run
大致看下Binlog_sender::run的逻辑
void Binlog_sender::run()
{
while (!has_error() && !m_thd->killed)
{
if (send_binlog(&log_cache, start_pos))
break;
}
}
解释到这里,大概就清楚了吧。