怎样进入网站管理系统,.net 网站开发工程师,网站线上体系,seo编辑的工作内容文章目录 1 node rules2 nginx rule2.1 Nginx 4xx 错误率太多2.2 Nginx 5xx 错误率太多2.3 Nginx 延迟高 3 mysql rule3.1 MySQL 宕机3.2 实例连接数过多3.3 MySQL高线程运行3.4 MySQL 从服务器 IO 线程没有运行3.5 MySQL 从服务器 SQL 线程没有运行3.6 MySQL复制滞后3.7 慢查询… 文章目录 1 node rules2 nginx rule2.1 Nginx 4xx 错误率太多2.2 Nginx 5xx 错误率太多2.3 Nginx 延迟高 3 mysql rule3.1 MySQL 宕机3.2 实例连接数过多3.3 MySQL高线程运行3.4 MySQL 从服务器 IO 线程没有运行3.5 MySQL 从服务器 SQL 线程没有运行3.6 MySQL复制滞后3.7 慢查询3.8 innodb 日志写入停滞3.9 MySQL 实例 1 分钟内重启过3.10 完成配置 4 redis rule4.1 Redis down4.2 Redis missing master4.3 Redis too many masters4.4 Redis disconnected slaves4.5 Redis replication broken Redis 复制已中断4.6 Redis cluster flapping Redis群集摆动4.7 Redis missing backup4.8 Redis out of system memory4.9 Redis out of configured maxmemory4.10 Redis too many connections4.11 Redis not enough connections4.12 Redis rejected connections4.13 完整规则文件 5 rabbitmq rule6 minio rule7 postgresql7.1 Postgresql down7.2 Postgresql restarted7.3 Postgresql exporter error7.4 Postgresql table not auto vacuumed7.5 Postgresql table not auto analyzed7.6 Postgresql too many connections7.7 Postgresql not enough connections7.8 Postgresql dead locks7.9 Postgresql high rollback rate7.10 Postgresql commit rate low7.11 Postgresql low XID consumption7.12 Postgresql high rate statement timeout7.13 Postgresql high rate deadlock7.14 Postgresql unused replication slot7.15 Postgresql too many dead tuples7.16 Postgresql SSL compression active7.17 Postgresql too many locks acquired7.18 Postgresql bloat index high ( 80%)7.19 Postgresql bloat table high ( 80%)7.20 完整规则 8 kafka rule9 keepalived rule 1 node rules groups:- name: noderules:# 服务器节点不可用- alert: NodeDownexpr: up 0for: 20slabels:severity: criticalannotations:summary: {{ $labels.instance }}: downdescription: {{ $labels.instance }} has been down for more than 3mvalue: {{ $value }}- alert: NodeCPUHigh# 节点 CPU 5 分钟的平均负载 过高大于 75%expr: (1 - avg by (instance) (irate(node_cpu_seconds_total{modeidle}[5m]))) * 100 75for: 5mlabels:severity: warningannotations:summary: {{$labels.instance}}: High CPU usagedescription: {{$labels.instance}}: CPU usage is above 75%value: {{ $value }}- alert: NodeCPUIowaitHigh# 节点 5 分钟内的CPU iowait 过高大于 50expr: avg by (instance) (irate(node_cpu_seconds_total{modeiowait}[5m])) * 100 50for: 5mlabels:severity: warningannotations:summary: {{$labels.instance}}: High CPU iowait usagedescription: {{$labels.instance}}: CPU iowait usage is above 50%value: {{ $value }}- alert: NodeMemoryUsageHigh# 节点内存使用率太高大于 90%# node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes 得出当前可用率# 1 - 当前可用率 得出已经使用率# (1 - 当前可用率) * 100 得出当前已使用百分比expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 90for: 5mlabels:severity: warningannotations:summary: {{$labels.instance}}: High memory usagedescription: {{$labels.instance}}: Memory usage is above 90%value: {{ $value }}- alert: NodeDiskRootLow# 根分区可用率太低小于 20%# node_filesystem_avail_bytes{fstype~ext.*|xfs,mountpoint /} / node_filesystem_size_bytes{fstype~ext.*|xfs,mountpoint /} 得出根分区容量可用率expr: node_filesystem_avail_bytes{fstype~ext.*|xfs,mountpoint /} / node_filesystem_size_bytes{fstype~ext.*|xfs,mountpoint /} * 100 20for: 10mlabels:severity: warningannotations:summary: {{$labels.instance}}: Low disk(the / partition) spacedescription: {{$labels.instance}}: 根分区可用率低于 20%当前值:{{ $value }}- alert: NodeLoad5Highexpr: (node_load5) (count by (instance) (node_cpu_seconds_total{modesystem}) * 2)for: 5mlabels:severity: warningannotations:summary: {{$labels.instance}}: Load(5m) Highdescription: {{$labels.instance}}: Load(5m) is 2 times the number of CPU coresvalue: {{ $value }}
2 nginx rule
依赖
2.1 Nginx 4xx 错误率太多 - alert: NginxHighHttp4xxErrorRateexpr: sum(rate(nginx_http_requests_total{status~^4..}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 5for: 1mlabels:severity: criticalannotations:summary: Nginx 状态码 4xx 错误率高 (实例 {{ $labels.instance }})description: HTTP 状态码为 4xx 的过多 ( 5%)\n 当前值{{ $value }}\n 标签{{ $labels }}
2.2 Nginx 5xx 错误率太多 - alert: NginxHighHttp5xxErrorRateexpr: sum(rate(nginx_http_requests_total{status~^5..}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 5for: 1mlabels:severity: criticalannotations:summary: Nginx 状态码 5xx 错误率高 (实例 {{ $labels.instance }})description: HTTP 状态码为 5xx 的过多 ( 5%)\n 当前值{{ $value }}\n 标签{{ $labels }}
2.3 Nginx 延迟高 - alert: NginxLatencyHighexpr: histogram_quantile(0.99, sum(rate(nginx_http_request_duration_seconds_bucket[2m])) by (host, node, le)) 3for: 2mlabels:severity: warningannotations:summary: Nginx延迟高 (实例{{ $labels.instance }})description: Nginx p99延迟高于3秒\n 当前值{{ $value }}\n 标签{{ $labels }}
3 mysql rule
3.1 MySQL 宕机 - alert: MysqlDownexpr: mysql_up 0for: 0mlabels:severity: criticalannotations:summary: MySQL ({{ $labels.instance }}) is down description: MySQL 挂了: {{ $labels.instance }}\n 当前值{{ $value }}\n 标签{{ $labels }}
3.2 实例连接数过多
{{ $labels.instance }}上使用了超过80%的MySQL连接。 - alert: MysqlTooManyConnections(80%)expr: max_over_time(mysql_global_status_threads_connected[1m]) / mysql_global_variables_max_connections * 100 80for: 2mlabels:severity: warningannotations:summary: MySQL too many connections ( 80%) (instance {{ $labels.instance }})description: {{ $labels.proj }}MySQL 的连接数超过了允许的 80% {{ $labels.instance }}\n 当前值{{ $value }}\n 标签{{ $labels }}
3.3 MySQL高线程运行
上超过60%的MySQL连接处于运行状态 - alert: MysqlHighThreadsRunningexpr: max_over_time(mysql_global_status_threads_running[1m]) / mysql_global_variables_max_connections * 100 60for: 2mlabels:severity: warningannotations:summary: MySQL high threads running (instance {{ $labels.instance }})description: 超过60%的MySQL连接在 {{ $labels.instance }} 上处于运行状态\n 当前值{{ $value }}\n 标签{{ $labels }}}
3.4 MySQL 从服务器 IO 线程没有运行 - alert: MysqlSlaveIoThreadNotRunningexpr: ( mysql_slave_status_slave_io_running and ON (instance) mysql_slave_status_master_server_id 0 ) 0for: 0mlabels:severity: criticalannotations:summary: MySQL Slave IO thread not running (instance {{ $labels.instance }})description: MySQL Slave IO线程未在{{ $labels.instance }} 上运行 \n 当前值{{ $value }}\n 标签{{ $labels }}
3.5 MySQL 从服务器 SQL 线程没有运行 - alert: MysqlSlaveSqlThreadNotRunningexpr: ( mysql_slave_status_slave_sql_running and ON (instance) mysql_slave_status_master_server_id 0) 0for: 0mlabels:severity: criticalannotations:summary: MySQL Slave SQL thread not running (instance {{ $labels.instance }})description: MySQL {{ $labels.instance }} 的 Slave SQL 线程没有运行。\n 当前值{{ $value }}\n 标签{{ $labels }}
3.6 MySQL复制滞后
就是 主节点的二级制事务太多的时候从节点复制的过慢 或者当我们从一个之前备份的主节点的数据导入到某个从节点时候也会出现这样的情况因为此时从节点是从导入数据的那个时候的二级制位置开始复制的但是此时 主节点的实际二级制位置要新。 这个 mysql_slave_status_seconds_behind_master 是执行命令 show salve status\G 返回结果中的 Seconds_Behind_Master 的值 而 mysql_slave_status_sql_delay 是 SQL_Delay 的值。 - alert: MysqlSlaveReplicationLagexpr: ( (mysql_slave_status_seconds_behind_master - mysql_slave_status_sql_delay) and ON (instance) mysql_slave_status_master_server_id 0 ) 30for: 1mlabels:severity: criticalannotations:summary: MySQL Slave replication lag (instance {{ $labels.instance }})description: MySQL 复制滞后了 \n 当前值{{ $value }}\n 标签{{ $labels }}
3.7 慢查询
MySQL服务器有新的慢速查询。 - alert: MysqlSlowQueriesexpr: increase(mysql_global_status_slow_queries[1m]) 0for: 2mlabels:severity: warningannotations:summary: MySQL slow queries (instance {{ $labels.instance }})description: MySQL 有一些新的慢查询.\n 当前值{{ $value }}\n 标签{{ $labels }}3.8 innodb 日志写入停滞
MySQL innodb日志写入停滞 - alert: MysqlInnodbLogWaitsexpr: rate(mysql_global_status_innodb_log_waits[15m]) 10for: 0mlabels:severity: warningannotations:summary: MySQL restarted (instance {{ $labels.instance }})description: MySQL innodb日志正在以 {{ $value }}/秒的速率等待写入磁盘\n 标签{{ $labels }}
3.9 MySQL 实例 1 分钟内重启过
实例 {{ $labels.instance }} 上的MySQL刚刚在一分钟内重启过。 - alert: MysqlRestartedexpr: mysql_global_status_uptime 60for: 0mlabels:severity: infoannotations:summary: MySQL restarted (实例: {{ $labels.instance }})description: MySQL 实例 {{ $labels.instance }} 1 分钟内刚刚重启.\n 当前值{{ $value }}\n 标签{{ $labels }}
3.10 完成配置
groups:
- name: MySQLAlertsrules:- alert: MysqlDownexpr: mysql_up 0for: 0mlabels:severity: criticalannotations:summary: MySQL down (实例: {{ $labels.instance }})description: MySQL 挂了: {{ $labels.instance }}\n 当前值{{ $value }}\n 标签{{ $labels }}# {{ $labels.instance }}上使用了超过80%的MySQL连接。- alert: MysqlTooManyConnections(80%)expr: max_over_time(mysql_global_status_threads_connected[1m]) / mysql_global_variables_max_connections * 100 80for: 2mlabels:severity: warningannotations:summary: MySQL 连接数过多 ( 80%) (实例: {{ $labels.instance }})description: {{ $labels.proj }}MySQL 的连接数超过了允许的 80% {{ $labels.instance }}\n 当前值{{ $value }}\n 标签{{ $labels }}# 上超过60%的MySQL连接处于运行状态- alert: MysqlHighThreadsRunningexpr: max_over_time(mysql_global_status_threads_running[1m]) / mysql_global_variables_max_connections * 100 60for: 2mlabels:severity: warningannotations:summary: MySQL 正处于高线程运行中 (实例: {{ $labels.instance }})description: 超过60%的MySQL连接在 {{ $labels.instance }} 上处于运行状态\n 当前值{{ $value }}\n 标签{{ $labels }}}# MySQL 从服务器 IO 线程没有运行- alert: MysqlSlaveIoThreadNotRunningexpr: ( mysql_slave_status_slave_io_running and ON (instance) mysql_slave_status_master_server_id 0 ) 0for: 0mlabels:severity: criticalannotations:summary: MySQL Slave IO thread 没有运行 (实例: {{ $labels.instance }})description: MySQL Slave IO线程未在{{ $labels.instance }} 上运行 \n 当前值{{ $value }}\n 标签{{ $labels }}# MySQL 从服务器 SQL 线程没有运行- alert: MysqlSlaveSqlThreadNotRunningexpr: ( mysql_slave_status_slave_sql_running and ON (instance) mysql_slave_status_master_server_id 0) 0for: 0mlabels:severity: criticalannotations:summary: Slave SQL 线程没有运行 (实例: {{ $labels.instance }})description: MySQL {{ $labels.instance }} 的 Slave SQL 线程没有运行。\n 当前值{{ $value }}\n 标签{{ $labels }}# MySQL复制滞后- alert: MysqlSlaveReplicationLagexpr: ( (mysql_slave_status_seconds_behind_master - mysql_slave_status_sql_delay) and ON (instance) mysql_slave_status_master_server_id 0 ) 30for: 1mlabels:severity: criticalannotations:summary: MySQL复制滞后 (实例: {{ $labels.instance }})description: MySQL 复制滞后了 \n 当前值{{ $value }}\n 标签{{ $labels }}# MySQL服务器有新的慢速查询。- alert: MysqlSlowQueriesexpr: increase(mysql_global_status_slow_queries[1m]) 0for: 2mlabels:severity: warningannotations:summary: MySQL 慢查询 (实例: {{ $labels.instance }})description: MySQL 有一些新的慢查询.\n 当前值{{ $value }}\n 标签{{ $labels }}# MySQL innodb日志写入停滞- alert: MysqlInnodbLogWaitsexpr: rate(mysql_global_status_innodb_log_waits[15m]) 10for: 0mlabels:severity: warningannotations:summary: MySQL InnoDB log 等待 (实例: {{ $labels.instance }})description: MySQL innodb日志正在以 {{ $value }}/秒的速率等待写入磁盘\n 标签{{ $labels }}# 实例 {{ $labels.instance }} 上的MySQL刚刚在一分钟内重启过。- alert: MysqlRestartedexpr: mysql_global_status_uptime 60for: 0mlabels:severity: infoannotations:summary: MySQL restarted (实例: {{ $labels.instance }})description: MySQL 实例 {{ $labels.instance }} 1 分钟内刚刚重启.\n 当前值{{ $value }}\n 标签{{ $labels }}4 redis rule
4.1 Redis down
Redis instance is down - alert: RedisDownexpr: redis_up 0for: 0mlabels:severity: criticalannotations:summary: Redis down (instance {{ $labels.instance }})description: Redis instance is down\n VALUE {{ $value }}\n LABELS {{ $labels }}4.2 Redis missing master
Redis cluster has no node marked as master. - alert: RedisMissingMasterexpr: (count(redis_instance_info{rolemaster}) or vector(0)) 1for: 0mlabels:severity: criticalannotations:summary: Redis missing master (instance {{ $labels.instance }})description: Redis cluster has no node marked as master.\n VALUE {{ $value }}\n LABELS {{ $labels }}4.3 Redis too many masters
Redis cluster has too many nodes marked as master. 如果是 cluster 模式修改 ( 1) 为正确的 master 数量比如正常是 3 个master 那就修改为: ( 3) - alert: RedisTooManyMastersexpr: count(redis_instance_info{rolemaster}) 1for: 0mlabels:severity: criticalannotations:summary: Redis too many masters (instance {{ $labels.instance }})description: Redis cluster has too many nodes marked as master.\n VALUE {{ $value }}\n LABELS {{ $labels }}4.4 Redis disconnected slaves
Redis没有为所有从属服务器进行复制。请考虑查看redis复制状态。 - alert: RedisDisconnectedSlavesexpr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 0for: 0mlabels:severity: criticalannotations:summary: Redis disconnected slaves (instance {{ $labels.instance }})description: Redis not replicating for all slaves. Consider reviewing the redis replication status.\n VALUE {{ $value }}\n LABELS {{ $labels }}4.5 Redis replication broken Redis 复制已中断
Redis实例丢失一个slave - alert: RedisReplicationBrokenexpr: delta(redis_connected_slaves[1m]) 0for: 0mlabels:severity: criticalannotations:summary: Redis replication broken (instance {{ $labels.instance }})description: Redis instance lost a slave\n VALUE {{ $value }}\n LABELS {{ $labels }}4.6 Redis cluster flapping Redis群集摆动
Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping). 在Redis副本连接中检测到更改。当副本节点失去与主节点的连接并重新连接也称为摆动时可能会发生这种情况。 - alert: RedisClusterFlappingexpr: changes(redis_connected_slaves[1m]) 1for: 2mlabels:severity: criticalannotations:summary: Redis cluster flapping (instance {{ $labels.instance }})description: Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping).\n VALUE {{ $value }}\n LABELS {{ $labels }}4.7 Redis missing backup
Redis has not been backuped for 24 hours Redis已24小时未备份 - alert: RedisMissingBackupexpr: time() - redis_rdb_last_save_timestamp_seconds 60 * 60 * 24for: 0mlabels:severity: criticalannotations:summary: Redis missing backup (instance {{ $labels.instance }})description: Redis has not been backuped for 24 hours\n VALUE {{ $value }}\n LABELS {{ $labels }}4.8 Redis out of system memory
Redis is running out of system memory ( 90%) The exporter must be started with --include-system-metrics flag or REDIS_EXPORTER_INCL_SYSTEM_METRICStrue environment variable. - alert: RedisOutOfSystemMemoryexpr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 90for: 2mlabels:severity: warningannotations:summary: Redis out of system memory (instance {{ $labels.instance }})description: Redis is running out of system memory ( 90%)\n VALUE {{ $value }}\n LABELS {{ $labels }}4.9 Redis out of configured maxmemory
Redis is running out of configured maxmemory ( 90%) - alert: RedisOutOfConfiguredMaxmemoryexpr: redis_memory_used_bytes / redis_memory_max_bytes * 100 90for: 2mlabels:severity: warningannotations:summary: Redis out of configured maxmemory (instance {{ $labels.instance }})description: Redis is running out of configured maxmemory ( 90%)\n VALUE {{ $value }}\n LABELS {{ $labels }}4.10 Redis too many connections
Redis is running out of connections ( 90% used) - alert: RedisTooManyConnectionsexpr: redis_connected_clients / redis_config_maxclients * 100 90for: 2mlabels:severity: warningannotations:summary: Redis too many connections (instance {{ $labels.instance }})description: Redis is running out of connections ( 90% used)\n VALUE {{ $value }}\n LABELS {{ $labels }}4.11 Redis not enough connections
Redis( 5) - alert: RedisNotEnoughConnectionsexpr: redis_connected_clients 5for: 2mlabels:severity: warningannotations:summary: Redis not enough connections (instance {{ $labels.instance }})description: Redis instance should have more connections ( 5)\n VALUE {{ $value }}\n LABELS {{ $labels }}4.12 Redis rejected connections
Some connections to Redis has been rejected - alert: RedisRejectedConnectionsexpr: increase(redis_rejected_connections_total[1m]) 0for: 0mlabels:severity: criticalannotations:summary: Redis rejected connections (instance {{ $labels.instance }})description: Some connections to Redis has been rejected\n VALUE {{ $value }}\n LABELS {{ $labels }}4.13 完整规则文件
groups:
- name: RedisAlertsrules:- alert: RedisDownexpr: redis_up 0for: 0mlabels:severity: criticalannotations:summary: Redis down (instance {{ $labels.instance }})description: Redis instance is down\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisMissingMasterexpr: (count(redis_instance_info{rolemaster}) or vector(0)) 1for: 0mlabels:severity: criticalannotations:summary: Redis missing master (instance {{ $labels.instance }})description: Redis cluster has no node marked as master.\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisTooManyMastersexpr: count(redis_instance_info{rolemaster}) 1for: 0mlabels:severity: criticalannotations:summary: Redis too many masters (instance {{ $labels.instance }})description: Redis cluster has too many nodes marked as master.\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisDisconnectedSlavesexpr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 0for: 0mlabels:severity: criticalannotations:summary: Redis disconnected slaves (instance {{ $labels.instance }})description: Redis not replicating for all slaves. Consider reviewing the redis replication status.\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisReplicationBrokenexpr: delta(redis_connected_slaves[1m]) 0for: 0mlabels:severity: criticalannotations:summary: Redis replication broken (instance {{ $labels.instance }})description: Redis instance lost a slave\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisClusterFlappingexpr: changes(redis_connected_slaves[1m]) 1for: 2mlabels:severity: criticalannotations:summary: Redis cluster flapping (instance {{ $labels.instance }})description: Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping).\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisMissingBackupexpr: time() - redis_rdb_last_save_timestamp_seconds 60 * 60 * 24for: 0mlabels:severity: criticalannotations:summary: Redis missing backup (instance {{ $labels.instance }})description: Redis has not been backuped for 24 hours\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisOutOfSystemMemoryexpr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 90for: 2mlabels:severity: warningannotations:summary: Redis out of system memory (instance {{ $labels.instance }})description: Redis is running out of system memory ( 90%)\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisOutOfConfiguredMaxmemoryexpr: redis_memory_used_bytes / redis_memory_max_bytes * 100 90for: 2mlabels:severity: warningannotations:summary: Redis out of configured maxmemory (instance {{ $labels.instance }})description: Redis is running out of configured maxmemory ( 90%)\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisTooManyConnectionsexpr: redis_connected_clients / redis_config_maxclients * 100 90for: 2mlabels:severity: warningannotations:summary: Redis too many connections (instance {{ $labels.instance }})description: Redis is running out of connections ( 90% used)\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisNotEnoughConnectionsexpr: redis_connected_clients 5for: 2mlabels:severity: warningannotations:summary: Redis not enough connections (instance {{ $labels.instance }})description: Redis instance should have more connections ( 5)\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: RedisRejectedConnectionsexpr: increase(redis_rejected_connections_total[1m]) 0for: 0mlabels:severity: criticalannotations:summary: Redis rejected connections (instance {{ $labels.instance }})description: Some connections to Redis has been rejected\n VALUE {{ $value }}\n LABELS {{ $labels }}
5 rabbitmq rule
6 minio rule
7 postgresql
7.1 Postgresql down
Postgresql instance is down - alert: PostgresqlDownexpr: pg_up 0for: 0mlabels:severity: criticalannotations:summary: Postgresql down (instance {{ $labels.instance }})description: Postgresql instance is down\n VALUE {{ $value }}\n LABELS {{ $labels }}7.2 Postgresql restarted
Postgresql restarted 此指标没有 - alert: PostgresqlRestartedexpr: time() - pg_postmaster_start_time_seconds 60for: 0mlabels:severity: criticalannotations:summary: Postgresql restarted (instance {{ $labels.instance }})description: Postgresql restarted\n VALUE {{ $value }}\n LABELS {{ $labels }}7.3 Postgresql exporter error
Postgresql exporter is showing errors. A query may be buggy in query.yaml - alert: PostgresqlExporterErrorexpr: pg_exporter_last_scrape_error 0for: 0mlabels:severity: criticalannotations:summary: Postgresql exporter error (instance {{ $labels.instance }})description: Postgresql exporter is showing errors. A query may be buggy in query.yaml\n VALUE {{ $value }}\n LABELS {{ $labels }}7.4 Postgresql table not auto vacuumed
Table {{ $labels.relname }} has not been auto vacuumed for 10 days 此指标没有 - alert: PostgresqlTableNotAutoVacuumedexpr: (pg_stat_user_tables_last_autovacuum 0) and (time() - pg_stat_user_tables_last_autovacuum) 60 * 60 * 24 * 10for: 0mlabels:severity: warningannotations:summary: Postgresql table not auto vacuumed (instance {{ $labels.instance }})description: Table {{ $labels.relname }} has not been auto vacuumed for 10 days\n VALUE {{ $value }}\n LABELS {{ $labels }}7.5 Postgresql table not auto analyzed
Table {{ $labels.relname }} has not been auto analyzed for 10 days 此指标没有 - alert: PostgresqlTableNotAutoAnalyzedexpr: (pg_stat_user_tables_last_autoanalyze 0) and (time() - pg_stat_user_tables_last_autoanalyze) 24 * 60 * 60 * 10for: 0mlabels:severity: warningannotations:summary: Postgresql table not auto analyzed (instance {{ $labels.instance }})description: Table {{ $labels.relname }} has not been auto analyzed for 10 days\n VALUE {{ $value }}\n LABELS {{ $labels }}7.6 Postgresql too many connections
PostgreSQL instance has too many connections ( 80%). 需要在配置文件中设置最大连接数 - alert: PostgresqlTooManyConnectionsexpr: sum by (instance, job, server) (pg_stat_activity_count) min by (instance, job, server) (pg_settings_max_connections * 0.8)for: 2mlabels:severity: warningannotations:summary: Postgresql too many connections (instance {{ $labels.instance }})description: PostgreSQL instance has too many connections ( 80%).\n VALUE {{ $value }}\n LABELS {{ $labels }}7.7 Postgresql not enough connections
PostgreSQL实例当前连接数过少 ( 5) - alert: PostgresqlNotEnoughConnectionsexpr: sum by (datname) (pg_stat_activity_count{datname!~template.*|postgres}) 5for: 2mlabels:severity: warningannotations:summary: Postgresql not enough connections (instance {{ $labels.instance }})description: PostgreSQL instance should have more connections ( 5)\n VALUE {{ $value }}\n LABELS {{ $labels }}7.8 Postgresql dead locks
PostgreSQL has dead-locks - alert: PostgresqlDeadLocksexpr: increase(pg_stat_database_deadlocks{datname!~template.*|postgres}[1m]) 5for: 0mlabels:severity: warningannotations:summary: Postgresql dead locks (instance {{ $labels.instance }})description: PostgreSQL has dead-locks\n VALUE {{ $value }}\n LABELS {{ $labels }}7.9 Postgresql high rollback rate
Ratio of transactions being aborted compared to committed is 2 % - alert: PostgresqlHighRollbackRateexpr: sum by (namespace,datname) ((rate(pg_stat_database_xact_rollback{datname!~template.*|postgres,datid!0}[3m])) / ((rate(pg_stat_database_xact_rollback{datname!~template.*|postgres,datid!0}[3m])) (rate(pg_stat_database_xact_commit{datname!~template.*|postgres,datid!0}[3m])))) 0.02for: 0mlabels:severity: warningannotations:summary: Postgresql high rollback rate (instance {{ $labels.instance }})description: Ratio of transactions being aborted compared to committed is 2 %\n VALUE {{ $value }}\n LABELS {{ $labels }}7.10 Postgresql commit rate low
Postgresql seems to be processing very few transactions - alert: PostgresqlCommitRateLowexpr: rate(pg_stat_database_xact_commit[1m]) 10for: 2mlabels:severity: criticalannotations:summary: Postgresql commit rate low (instance {{ $labels.instance }})description: Postgresql seems to be processing very few transactions\n VALUE {{ $value }}\n LABELS {{ $labels }}7.11 Postgresql low XID consumption
Postgresql seems to be consuming transaction IDs very slowly 此指标没有 - alert: PostgresqlLowXidConsumptionexpr: rate(pg_txid_current[1m]) 5for: 2mlabels:severity: warningannotations:summary: Postgresql low XID consumption (instance {{ $labels.instance }})description: Postgresql seems to be consuming transaction IDs very slowly\n VALUE {{ $value }}\n LABELS {{ $labels }}7.12 Postgresql high rate statement timeout
Postgres transactions showing high rate of statement timeouts 此指标没有 - alert: PostgresqlHighRateStatementTimeoutexpr: rate(postgresql_errors_total{typestatement_timeout}[1m]) 3for: 0mlabels:severity: criticalannotations:summary: Postgresql high rate statement timeout (instance {{ $labels.instance }})description: Postgres transactions showing high rate of statement timeouts\n VALUE {{ $value }}\n LABELS {{ $labels }}7.13 Postgresql high rate deadlock
Postgres detected deadlocks 此指标没有 - alert: PostgresqlHighRateDeadlockexpr: increase(postgresql_errors_total{typedeadlock_detected}[1m]) 1for: 0mlabels:severity: criticalannotations:summary: Postgresql high rate deadlock (instance {{ $labels.instance }})description: Postgres detected deadlocks\n VALUE {{ $value }}\n LABELS {{ $labels }}7.14 Postgresql unused replication slot
Unused Replication Slots 此指标没有 - alert: PostgresqlUnusedReplicationSlotexpr: pg_replication_slots_active 0for: 1mlabels:severity: warningannotations:summary: Postgresql unused replication slot (instance {{ $labels.instance }})description: Unused Replication Slots\n VALUE {{ $value }}\n LABELS {{ $labels }}7.15 Postgresql too many dead tuples
PostgreSQL dead tuples is too large 没有 - alert: PostgresqlTooManyDeadTuplesexpr: ((pg_stat_user_tables_n_dead_tup 10000) / (pg_stat_user_tables_n_live_tup pg_stat_user_tables_n_dead_tup)) 0.1for: 2mlabels:severity: warningannotations:summary: Postgresql too many dead tuples (instance {{ $labels.instance }})description: PostgreSQL dead tuples is too large\n VALUE {{ $value }}\n LABELS {{ $labels }}7.16 Postgresql SSL compression active
启用SSL压缩的数据库连接。这可能会在复制延迟中增加显著的抖动。副本应通过“recovery.conf”中的 sslcompression0 关闭SSL压缩。 - alert: PostgresqlSslCompressionActiveexpr: sum(pg_stat_ssl_compression) 0for: 0mlabels:severity: criticalannotations:summary: Postgresql SSL compression active (instance {{ $labels.instance }})description: Database connections with SSL compression enabled. This may add significant jitter in replication delay. Replicas should turn off SSL compression via sslcompression0 in recovery.conf.\n VALUE {{ $value }}\n LABELS {{ $labels }}7.17 Postgresql too many locks acquired
在数据库上获取的锁太多。如果此警报频繁发生我们可能需要增加postgres设置 max_locks_per_transaction。 需要在 settings 配置文件中设置 - alert: PostgresqlTooManyLocksAcquiredexpr: ((sum (pg_locks_count)) / (pg_settings_max_locks_per_transaction * pg_settings_max_connections)) 0.20for: 2mlabels:severity: criticalannotations:summary: Postgresql too many locks acquired (instance {{ $labels.instance }})description: Too many locks acquired on the database. If this alert happens frequently, we may need to increase the postgres setting max_locks_per_transaction.\n VALUE {{ $value }}\n LABELS {{ $labels }}7.18 Postgresql bloat index high ( 80%)
The index {{ $labels.idxname }} is bloated. You should execute REINDEX INDEX CONCURRENTLY {{ $labels.idxname }}; See https://github.com/samber/awesome-prometheus-alerts/issues/289#issuecomment-1164842737 没有 - alert: PostgresqlBloatIndexHigh(80%)expr: pg_bloat_btree_bloat_pct 80 and on (idxname) (pg_bloat_btree_real_size 100000000)for: 1hlabels:severity: warningannotations:summary: Postgresql bloat index high ( 80%) (instance {{ $labels.instance }})description: The index {{ $labels.idxname }} is bloated. You should execute REINDEX INDEX CONCURRENTLY {{ $labels.idxname }};\n VALUE {{ $value }}\n LABELS {{ $labels }}7.19 Postgresql bloat table high ( 80%)
The table {{ $labels.relname }} is bloated. You should execute VACUUM {{ $labels.relname }}; See https://github.com/samber/awesome-prometheus-alerts/issues/289#issuecomment-1164842737 没有 - alert: PostgresqlBloatTableHigh(80%)expr: pg_bloat_table_bloat_pct 80 and on (relname) (pg_bloat_table_real_size 200000000)for: 1hlabels:severity: warningannotations:summary: Postgresql bloat table high ( 80%) (instance {{ $labels.instance }})description: The table {{ $labels.relname }} is bloated. You should execute VACUUM {{ $labels.relname }};\n VALUE {{ $value }}\n LABELS {{ $labels }}7.20 完整规则
groups:
- name: PostgresqlAlertrules:- alert: PostgresqlDownexpr: pg_up 0for: 0mlabels:severity: criticalannotations:summary: Postgresql down (instance {{ $labels.instance }})description: Postgresql instance is down\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlRestartedexpr: time() - pg_postmaster_start_time_seconds 60for: 0mlabels:severity: criticalannotations:summary: Postgresql restarted (instance {{ $labels.instance }})description: Postgresql restarted\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlExporterErrorexpr: pg_exporter_last_scrape_error 0for: 0mlabels:severity: criticalannotations:summary: Postgresql exporter error (instance {{ $labels.instance }})description: Postgresql exporter is showing errors. A query may be buggy in query.yaml\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlTableNotAutoVacuumedexpr: (pg_stat_user_tables_last_autovacuum 0) and (time() - pg_stat_user_tables_last_autovacuum) 60 * 60 * 24 * 10for: 0mlabels:severity: warningannotations:summary: Postgresql table not auto vacuumed (instance {{ $labels.instance }})description: Table {{ $labels.relname }} has not been auto vacuumed for 10 days\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlTableNotAutoAnalyzedexpr: (pg_stat_user_tables_last_autoanalyze 0) and (time() - pg_stat_user_tables_last_autoanalyze) 24 * 60 * 60 * 10for: 0mlabels:severity: warningannotations:summary: Postgresql table not auto analyzed (instance {{ $labels.instance }})description: Table {{ $labels.relname }} has not been auto analyzed for 10 days\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlTooManyConnectionsexpr: sum by (instance, job, server) (pg_stat_activity_count) min by (instance, job, server) (pg_settings_max_connections * 0.8)for: 2mlabels:severity: warningannotations:summary: Postgresql too many connections (instance {{ $labels.instance }})description: PostgreSQL instance has too many connections ( 80%).\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlNotEnoughConnectionsexpr: sum by (datname) (pg_stat_activity_count{datname!~template.*|postgres}) 5for: 2mlabels:severity: warningannotations:summary: Postgresql not enough connections (instance {{ $labels.instance }})description: PostgreSQL instance should have more connections ( 5)\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlDeadLocksexpr: increase(pg_stat_database_deadlocks{datname!~template.*|postgres}[1m]) 5for: 0mlabels:severity: warningannotations:summary: Postgresql dead locks (instance {{ $labels.instance }})description: PostgreSQL has dead-locks\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlHighRollbackRateexpr: sum by (namespace,datname) ((rate(pg_stat_database_xact_rollback{datname!~template.*|postgres,datid!0}[3m])) / ((rate(pg_stat_database_xact_rollback{datname!~template.*|postgres,datid!0}[3m])) (rate(pg_stat_database_xact_commit{datname!~template.*|postgres,datid!0}[3m])))) 0.02for: 0mlabels:severity: warningannotations:summary: Postgresql high rollback rate (instance {{ $labels.instance }})description: Ratio of transactions being aborted compared to committed is 2 %\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlCommitRateLowexpr: rate(pg_stat_database_xact_commit[1m]) 10for: 2mlabels:severity: criticalannotations:summary: Postgresql commit rate low (instance {{ $labels.instance }})description: Postgresql seems to be processing very few transactions\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlLowXidConsumptionexpr: rate(pg_txid_current[1m]) 5for: 2mlabels:severity: warningannotations:summary: Postgresql low XID consumption (instance {{ $labels.instance }})description: Postgresql seems to be consuming transaction IDs very slowly\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlHighRateStatementTimeoutexpr: rate(postgresql_errors_total{typestatement_timeout}[1m]) 3for: 0mlabels:severity: criticalannotations:summary: Postgresql high rate statement timeout (instance {{ $labels.instance }})description: Postgres transactions showing high rate of statement timeouts\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlHighRateDeadlockexpr: increase(postgresql_errors_total{typedeadlock_detected}[1m]) 1for: 0mlabels:severity: criticalannotations:summary: Postgresql high rate deadlock (instance {{ $labels.instance }})description: Postgres detected deadlocks\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlUnusedReplicationSlotexpr: pg_replication_slots_active 0for: 1mlabels:severity: warningannotations:summary: Postgresql unused replication slot (instance {{ $labels.instance }})description: Unused Replication Slots\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlTooManyDeadTuplesexpr: ((pg_stat_user_tables_n_dead_tup 10000) / (pg_stat_user_tables_n_live_tup pg_stat_user_tables_n_dead_tup)) 0.1for: 2mlabels:severity: warningannotations:summary: Postgresql too many dead tuples (instance {{ $labels.instance }})description: PostgreSQL dead tuples is too large\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlConfigurationChangedexpr: {__name__~pg_settings_.*} ! ON(__name__) {__name__~pg_settings_([^t]|t[^r]|tr[^a]|tra[^n]|tran[^s]|trans[^a]|transa[^c]|transac[^t]|transact[^i]|transacti[^o]|transactio[^n]|transaction[^_]|transaction_[^r]|transaction_r[^e]|transaction_re[^a]|transaction_rea[^d]|transaction_read[^_]|transaction_read_[^o]|transaction_read_o[^n]|transaction_read_on[^l]|transaction_read_onl[^y]).*} OFFSET 5mfor: 0mlabels:severity: infoannotations:summary: Postgresql configuration changed (instance {{ $labels.instance }})description: Postgres Database configuration change has occurred\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlSslCompressionActiveexpr: sum(pg_stat_ssl_compression) 0for: 0mlabels:severity: criticalannotations:summary: Postgresql SSL compression active (instance {{ $labels.instance }})description: Database connections with SSL compression enabled. This may add significant jitter in replication delay. Replicas should turn off SSL compression via sslcompression0 in recovery.conf.\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlTooManyLocksAcquiredexpr: ((sum (pg_locks_count)) / (pg_settings_max_locks_per_transaction * pg_settings_max_connections)) 0.20for: 2mlabels:severity: criticalannotations:summary: Postgresql too many locks acquired (instance {{ $labels.instance }})description: Too many locks acquired on the database. If this alert happens frequently, we may need to increase the postgres setting max_locks_per_transaction.\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlBloatIndexHigh(80%)expr: pg_bloat_btree_bloat_pct 80 and on (idxname) (pg_bloat_btree_real_size 100000000)for: 1hlabels:severity: warningannotations:summary: Postgresql bloat index high ( 80%) (instance {{ $labels.instance }})description: The index {{ $labels.idxname }} is bloated. You should execute REINDEX INDEX CONCURRENTLY {{ $labels.idxname }};\n VALUE {{ $value }}\n LABELS {{ $labels }}- alert: PostgresqlBloatTableHigh(80%)expr: pg_bloat_table_bloat_pct 80 and on (relname) (pg_bloat_table_real_size 200000000)for: 1hlabels:severity: warningannotations:summary: Postgresql bloat table high ( 80%) (instance {{ $labels.instance }})description: The table {{ $labels.relname }} is bloated. You should execute VACUUM {{ $labels.relname }};\n VALUE {{ $value }}\n LABELS {{ $labels }}8 kafka rule
9 keepalived rule 参考 https://samber.github.io/awesome-prometheus-alerts/