视频源网站怎么做,wordpress如何做淘宝客,seo描述是写什么,网站在线统计代码水平有限有误请谅解这个问题是一位朋友DBA-老庄的,他们使用的是PXC环境如下:MySQL:5.7.18-15wsrep:29.20os:Red Hat Enterprise Linux Server release 6.5实际上我对PXC并不是很熟#xff0c;通过分析pstack还是找到了问题。并且提交Bug#xff0c;percona确认了。虽然我不是…水平有限有误请谅解这个问题是一位朋友DBA-老庄的,他们使用的是PXC环境如下:MySQL:5.7.18-15wsrep:29.20os:Red Hat Enterprise Linux Server release 6.5实际上我对PXC并不是很熟通过分析pstack还是找到了问题。并且提交Bugpercona确认了。虽然我不是第一个发现这个问题的人。一、问题描述数据库处于完全hang住的状态不能连接不能kill连接不能show engine innodb 等等。已有的连接通过show processlist看到大量如下的连接image.png操作系统层面基本看不出来任何负载image.pngimage.png对于这种问题只能用pstack进行问题确认了。二、为什么这么多opening tables的会话通过pstack我发现很多会话堵塞在trx_allocate_for_mysql()如下Thread 54 (Thread 0x7f9085cf6700 (LWP 17448)): #0 0x0000003715e0b5bc in pthread_cond_waitGLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000000011059cb in os_event::wait_low(long) () #2 0x00000000011b0449 in sync_array_wait_event(sync_array_t*, sync_cell_t*) () #3 0x000000000108a8c4 in TTASEventMutex::wait(char const*, unsigned int, unsigned int) () #4 0x000000000108aa3b in PolicyMutex ::enter(unsigned int, unsigned int, char const*, unsigned int) () #5 0x00000000011e5974 in trx_allocate_for_mysql() () #wait trx #6 0x000000000106fa9f in innobase_trx_allocate(THD*) () #7 0x0000000001076d28 in ha_innobase::extra(ha_extra_function) () #8 0x0000000000ce4229 in open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) () #9 0x0000000000ce5912 in open_tables_for_query(THD*, TABLE_LIST*, unsigned int) ()简单的说对于innodb表进行任何操作即便是select也需要分配事物如果在事物池中没有可用的事物就行要调用这个函数进行分配以下是栈帧#0 trx_allocate_for_mysql () at /mysql/mysql-5.7.17/storage/innobase/trx/trx0trx.cc:538 #1 0x0000000001913d62 in innobase_trx_allocate (thd0x7fffc8000d30) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:2580 #2 0x0000000001913e04 in check_trx_exists (thd0x7fffc8000d30) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:2605 #3 0x0000000001914482 in ha_innobase::update_thd (this0x7fffc8009990, thd0x7fffc8000d30) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:2825 #4 0x00000000019296b4 in ha_innobase::info_low (this0x7fffc8009990, flag26, is_analyzefalse) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:13805 #5 0x000000000192a385 in ha_innobase::info (this0x7fffc8009990, flag26) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:14211 #6 0x000000000191ad83 in ha_innobase::open (this0x7fffc8009990, name0x7fffcc1b4540 ./test/test1, mode2, test_if_locked2) at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:6130 #7 0x0000000000f48d09 in handler::ha_open (this0x7fffc8009990, table_arg0x7fffc8008fe0, name0x7fffcc1b4540 ./test/test1, mode2, test_if_locked2) at /mysql/mysql-5.7.17/sql/handler.cc:2759 #8 0x0000000001674fd1 in open_table_from_share (thd0x7fffc8000d30, share0x7fffcc1b4170, alias0x7fffc80051d8 test1, db_stat39, prgflag8, ha_open_flags0, outparam0x7fffc8008fe0, is_create_tablefalse) at /mysql/mysql-5.7.17/sql/table.cc:3336 #9 0x00000000014f9577 in open_table (thd0x7fffc8000d30, table_list0x7fffc80051e0, ot_ctx0x7ffff149fb80) at /mysql/mysql-5.7.17/sql/sql_base.cc:3522 #10 0x00000000014fbf7f in open_and_process_table (thd0x7fffc8000d30, lex0x7fffc8003028, tables0x7fffc80051e0, counter0x7fffc80030e8, flags0, prelocking_strategy0x7ffff149fcb0, has_prelocking_listfalse, ot_ctx0x7ffff149fb80) at /mysql/mysql-5.7.17/sql/sql_base.cc:5108 #11 0x00000000014fd06a in open_tables (thd0x7fffc8000d30, start0x7ffff149fc70, counter0x7fffc80030e8, flags0, prelocking_strategy0x7ffff149fcb0) at /mysql/mysql-5.7.17/sql/sql_base.cc:5719而这个函数里面包含如下代码trx_sys_mutex_enter(); ##获取trx_sys-mutex锁 ut_d(trx-in_mysql_trx_list TRUE);UT_LIST_ADD_FIRST(trx_sys-mysql_trx_list, trx); ##将事物加入trx_sys全局结构中的链表中 trx_sys_mutex_exit();trx_sys是一个全局的数据结构各个事物都以链表的形式挂载到它下面那么修改这些链表需要通过一个mutex来保护这个全局数据结构避免多线程并发的修改。比如这里就是更新链表操作。但是我们从栈帧来看他处于open_table函数本函数主要建立table cache同时做好表的实例化也就是建立好mysql层和innodb层文件的对应关系此外还会获取相应的MDL LOCK和打开frm文件。为了测试我简单的在代码中加入了sleep(10),停顿10秒可以看到如下。证明这里的opening tables确实是在trx_allocate_for_mysql 发生了等待出现的状态image.png所以show processlist的state只是一个状态值它代表是代码某一段到某一段的执行阶段下面是一个典型的select的状态切换流程。但是要确认问题有时候光靠这个是不够的。T2: | THD::enter_stage: starting /root/mysql5.7.14/percona-server-5.7.14-7/sql/conn_handler/socket_connection.cc:100T2: | | | | | THD::enter_stage: checking permissions /root/mysql5.7.14/percona-server-5.7.14-7/sql/auth/sql_authorization.cc:843 T2: | | | | | | THD::enter_stage: Opening tables /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_base.cc:5719 T2: | | | | | THD::enter_stage: init /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_select.cc:121T2: | | | | | | | THD::enter_stage: System lock /root/mysql5.7.14/percona-server-5.7.14-7/sql/lock.cc:321 T2: | | | | | | | THD::enter_stage: optimizing /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:151T2: | | | | | | | THD::enter_stage: statistics /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:386 T2: | | | | | | | THD::enter_stage: preparing /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_optimizer.cc:494T2: | | | | | | THD::enter_stage: executing /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_executor.cc:119T2: | | | | | | THD::enter_stage: Sending data /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_executor.cc:195T2: | | | | | THD::enter_stage: end /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_select.cc:199 T2: | | | | THD::enter_stage: query end /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5174 T2: | | | | THD::enter_stage: closing tables /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5252 T2: | | | THD::enter_stage: freeing items /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:5855T2: | | THD::enter_stage: cleaning up /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_parse.cc:1884三、详细的分析pstack因为pstack日志太长了。我就不贴了。详细的分析pstack日志在开头给出的bug连接。其实要在冗长的pstack中找到有用的信息和合理的解释是一个困难的过程因为源码能力非常有限某些时候只能通过搜索临界区来确认问题。下面是我分析的结果也是提交bug给出了的I use pstack to review stack discover Dead lockAnalyze pstack i find some problem: Thread 56: lock:trx_sys (when parameter wsrep_log_conflictsON lock0lock.cc 2281 line) requisite:LOCK_wsrep_thdThread 9: lock: LOCK_thd_list (mysql_thread_manager.cc 339 line) requisite:LOCK_thd_data (sql_parse.h 175 line)Thread 26: lock: LOCK_thd_data (in PFS_status_variable_cache::do_materialize_all after PFS_status_variable_cache::manifest release LOCK_thd_data ,but hang) requisite:trx_sys-mutex (srv0srv.cc 1703 line)a lot of Thread wait when call function trx_allocate_for_mysql at mutex trx_sysa lot of Thread wait when call function THD::release_resources at mutex LOCK_thd_dataa lot of Thread wait when call function Global_THD_manager::add_thd at mutex LOCK_thd_list and any other mutex wait!!but I not find which thread hold LOCK_wsrep_thd mutex.Now we do follow things hope to resolve this problem: 1、wsrep_log_conflictsoff 2、SET global optimizer_switch materializationoff; 3、at high load time not execute sql show [global] status/select * from performance_schema.global_status简单的说我发现有多个线程获取mutex近乎出现环状但是其中一环没有找到。最终percona恢复如下Your problem sounds quite similar to one mentioned here: https://jira.percona.com/browse/PXC-877 Said release fix the issue https://www.percona.com/blog/2018/01/26/percona-xtradb-cluster-5-7-20-29-24-is-now-available/ You may want to consider an upgrade to latest one though which has more fixes 5.7.21.虽然我不是第一个发现的人但是起码确认我的分析基本确认的问题。蛋疼又说升级升级。作者微信微信.jpg