设计案例分享网站,网站可以用PS设计吗,襄阳抖音seo找哪家,深圳网页设计制作网站问题描述
昨晚RepmgrPG14主备主库因wal日志撑爆磁盘#xff0c;删除主库过期wal文件重做备库后上午进行主备状态巡查#xff0c;主库向备库发送wal文件正常#xff0c;但是查主库状态时发现显示有1条归档失败的记录。 postgres: archiver failed on 000000010000006F000000…问题描述
昨晚RepmgrPG14主备主库因wal日志撑爆磁盘删除主库过期wal文件重做备库后上午进行主备状态巡查主库向备库发送wal文件正常但是查主库状态时发现显示有1条归档失败的记录。 postgres: archiver failed on 000000010000006F00000086
主库
walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10 walsender正常 archiver failed on 000000010000006F00000086 归档失败
备库
walreceiver streaming 77/9EB6A198 “” “” walreceiver正常
--查主库数据库状态
[rootpgmaster ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
Process: 3710968 ExecStart/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (codeexited, status0/SUCCESS)
Main PID: 3710970 (postgres)
Tasks: 53 (limit: 201967)
Memory: 19.0G
CGroup: /system.slice/postgres.service
├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 3710971 postgres: logger
├─ 3710992 postgres: checkpointer
├─ 3710993 postgres: background writer
├─ 3710994 postgres: walwriter
├─ 3710995 postgres: archiver failed on 000000010000006F00000086
├─ 3710996 postgres: logical replication launcher
├─ 3711001 postgres: top_portal top_portal 172.28.32.18(41438) idle
├─ 3711003 postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle
├─ 3711009 postgres: repmgr repmgr 172.28.32.22(64096) idle
├─ 3711468 postgres: top_portal top_portal 172.28.32.18(41720) idle
├─ 3713807 postgres: top_portal top_portal 172.28.32.20(44492) idle
├─ 3723017 postgres: walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10 #wal 发送正常--查备库状态
[rootpgslave ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2023-10-13 00:12:19 CST; 12h ago
Process: 1931221 ExecStart/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (codeexited, status0/SUCCESS)
Main PID: 1931223 (postgres)
Tasks: 7 (limit: 201967)
Memory: 23.2G
CGroup: /system.slice/postgres.service
├─ 1931223 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 1931224 postgres: logger
├─ 1931225 postgres: startup recovering 00000001000000770000009E
├─ 1931226 postgres: checkpointer
├─ 1931227 postgres: background writer
├─ 1931230 postgres: walreceiver streaming 77/9EB6A198 #wal接收
└─ 1931430 postgres: repmgr repmgr 172.28.32.23(22956) idle Oct 13 00:12:17 pgslave systemd[1]: Starting PostgreSQL database server...
Oct 13 00:12:17 pgslave pg_ctl[1931221]: waiting for server to start....
Oct 13 00:12:17 pgslave pg_ctl[1931223]: 2023-10-13 00:12:17.497 CST [1931223] LOG: redirecting log output to logging collector process
Oct 13 00:12:17 pgslave pg_ctl[1931223]: 2023-10-13 00:12:17.497 CST [1931223] HINT: Future log output will appear in directory log.
Oct 13 00:12:19 pgslave pg_ctl[1931221]: . done
Oct 13 00:12:19 pgslave pg_ctl[1931221]: server started
Oct 13 00:12:19 pgslave systemd[1]: Started PostgreSQL database server.问题分析
1.查看数据库日志 2.查看归档配置参数
参数配置正确归档目录权限也正确
postgres# show archive_command;archive_command
-----------------------------------------------------------/usr/bin/lz4 -q -z %p /server/data/pgdb/pg_archive/%f.lz4
(1 row)postgres# show archive_mode;archive_mode
--------------on
(1 row)--查看归档目录的权限
[postgrespgmaster ~]$ ls -ld /server/data/pgdb/pg_archive
drwxr-x--- 2 postgres postgres 4214784 Oct 13 13:14 /server/data/pgdb/pg_archive3.手动切日志
手工归档成功但是未解决查看状态依然时卡住归档失败的那条wal记录那里
--手工归档
top_portal# select pg_switch_wal();pg_switch_wal
---------------72/51C4CFD8
(1 row)--查主库数据库状态
[rootpgmaster ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
Process: 3710968 ExecStart/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (codeexited, status0/SUCCESS)
Main PID: 3710970 (postgres)
Tasks: 53 (limit: 201967)
Memory: 19.0G
CGroup: /system.slice/postgres.service
├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 3710971 postgres: logger
├─ 3710992 postgres: checkpointer
├─ 3710993 postgres: background writer
├─ 3710994 postgres: walwriter
├─ 3710995 postgres: archiver failed on 000000010000006F00000086
├─ 3710996 postgres: logical replication launcher
├─ 3711001 postgres: top_portal top_portal 172.28.32.18(41438) idle
├─ 3711003 postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle
├─ 3711009 postgres: repmgr repmgr 172.28.32.22(64096) idle
├─ 3711468 postgres: top_portal top_portal 172.28.32.18(41720) idle
├─ 3713807 postgres: top_portal top_portal 172.28.32.20(44492) idle
├─ 3723017 postgres: walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10 #wal 发送正常--查当前wal_lsn
top_portal# select pg_current_wal_lsn();pg_current_wal_lsn
--------------------72/52638F10
(1 row)--查当前wal_lsn对应的wal文件
top_portal# select pg_walfile_name(pg_current_wal_lsn());pg_walfile_name
--------------------------000000010000007200000052
(1 row)--查当前最新检查点,最新检查点之前的wal文件均可以删除
[postgrespgmaster ~]$ pg_controldata $PGDATA
pg_control version number: 1300
Catalog version number: 202107181
Database system identifier: 7268852449124462799
Database cluster state: in production
pg_control last modified: Fri 13 Oct 2023 10:07:35 AM CST
Latest checkpoint location: 71/CDD2FF28
Latest checkpoints REDO location: 71/CDD28F18
Latest checkpoints REDO WAL file: 0000000100000071000000CD--查报错中的wal文件
[postgrespgmaster pg_wal]$ ls -l 000000010000006F00000086
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 000000010000006F00000086
[postgrespgmaster pg_wal]$ find /server/data/pgdb/pg_archive -name 000000010000006F00000086*
ls: cannot access 000000010000006F00000086: No such file or directory
[postgrespgmaster pg_wal]$ find /server -name 000000010000006F00000086*
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 000000010000006F000000864.检查$PGDATA/pg_wal/archive_status/目录下文件
[postgrespgmaster ~]$ cd /server/data/pgdb/data/pg_wal/archive_status/
[postgrespgmaster archive_status]$ ls -l *.ready
ls: cannot access *.ready: No such file or directory说明不存在需要归档但没归档的文件该目录下ready说明是需要归档但是没归档的done是归档完成了的
解决办法
1.将归档失败的wal文件备份到/home/postgres目录下(生产环境如果磁盘空间允许切记不要rm删除mv备份到目标位置) 2.手工归档select pg_switch_wal(); 3.再次查看主备库状态
--1.将归档失败的wal文件备份到/home/postgres目录下
[postgrespgmaster pg_wal]$ mv 000000010000006F00000086 /home/postgres/000000010000006F00000086
[postgrespgmaster pg_wal]$ ls -l /home/postgres/000000010000006F00000086
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 /home/postgres/000000010000006F00000086--2.手工归档
postgres# select pg_switch_wal();pg_switch_wal
---------------73/7EF502E0
(1 row)--3.再次查看主库状态显示正常
[rootpgmaster data]# systemctl status postgres
● postgres.service - PostgreSQL database serverLoaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h agoProcess: 3710968 ExecStart/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (codeexited, status0/SUCCESS)Main PID: 3710970 (postgres)Tasks: 50 (limit: 201967)Memory: 26.6GCGroup: /system.slice/postgres.service├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data├─ 3710971 postgres: logger ├─ 3710992 postgres: checkpointer ├─ 3710993 postgres: background writer ├─ 3710994 postgres: walwriter ├─ 3710995 postgres: archiver archiving 000000010000007100000035 ├─ 3710996 postgres: logical replication launcher ├─ 3711001 postgres: top_portal top_portal 172.28.32.18(41438) idle ├─ 3711003 postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle ├─ 3711009 postgres: repmgr repmgr 172.28.32.22(64096) idle ├─ 3711468 postgres: top_portal top_portal 172.28.32.18(41720) idle ├─ 3713807 postgres: top_portal top_portal 172.28.32.20(44492) idle ├─ 3723017 postgres: walsender repmgr 172.28.32.23(36122) streaming 73/7F000BD0补充若$PGDATA/pg_wal/archive_status/目录下存在大量的*.ready文件
可能的原因分析如果数据库是突然断电那么可能arvchive命令没有完全完成归档目录会存在不完整的文件名称重启数据库后会出现归档失败的情况这个时候需要去归档目录删除相关归档失败文件那么归档就会重新归档。 还未遇到该场景的问题暂未实验。 参考链接