深圳专业做网站专业公司,电子商务网络营销的特点,单页面网站制作教程,WordPress网站被恶意登录前言#xff1a;
最近在部署prometheus的过程中遇到的这个问题#xff0c;感觉比较的经典#xff0c;有必要记录一下。
现象是部署prometheus主服务的时候#xff0c;看不到pod#xff0c;只能看到deployment#xff0c;由于慌乱#xff0c;一度以为是集群有毛病了
最近在部署prometheus的过程中遇到的这个问题感觉比较的经典有必要记录一下。
现象是部署prometheus主服务的时候看不到pod只能看到deployment由于慌乱一度以为是集群有毛病了然后重新做了集群具体情况如下图
注up-to-date表示没有部署available表示无可用pod
[rootnode4 yaml]# k get deployments.apps -n monitor-sa
NAME READY UP-TO-DATE AVAILABLE AGE
prometheus-server 0/2 0 0 2m5s
[rootnode4 yaml]# k get po -n monitor-sa
NAME READY STATUS RESTARTS AGE
node-exporter-6ttbl 1/1 Running 0 23h
node-exporter-7ls5t 1/1 Running 0 23h
node-exporter-r287q 1/1 Running 0 23h
node-exporter-z85dm 1/1 Running 0 23h部署文件如下
注意注意有一个sa的引用哦 serviceAccountName: monitor
[rootnode4 yaml]# cat prometheus-deploy.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheus-servernamespace: monitor-salabels:app: prometheus
spec:replicas: 2selector:matchLabels:app: prometheuscomponent: server#matchExpressions:#- {key: app, operator: In, values: [prometheus]}#- {key: component, operator: In, values: [server]}template:metadata:labels:app: prometheuscomponent: serverannotations:prometheus.io/scrape: falsespec:nodeName: node4serviceAccountName: monitorcontainers:- name: prometheusimage: prom/prometheus:v2.2.1imagePullPolicy: IfNotPresentcommand:- prometheus- --config.file/etc/prometheus/prometheus.yml- --storage.tsdb.path/prometheus- --storage.tsdb.retention720hports:- containerPort: 9090protocol: TCPvolumeMounts:- mountPath: /etc/prometheus/prometheus.ymlname: prometheus-configsubPath: prometheus.yml- mountPath: /prometheus/name: prometheus-storage-volumevolumes:- name: prometheus-configconfigMap:name: prometheus-configitems:- key: prometheus.ymlpath: prometheus.ymlmode: 0644- name: prometheus-storage-volumehostPath:path: /datatype: Directory解决方案
那么遇到这种情况我们应该怎么做呢当然了第一点就是不要慌其次deployment控制器有一个比较不让人注意的地方就是编辑deployment可以看到该deployment的当前状态详情会有非常详细的信息给我们看也就是status字段
具体的命令是 kubectl edit deployment -n 命名空间 deployment名称在本例中是这样的
。。。。。。略略略 path: prometheus.ymlname: prometheus-configname: prometheus-config- hostPath:path: /datatype: Directoryname: prometheus-storage-volume
status:conditions:- lastTransitionTime: 2023-11-22T15:21:06ZlastUpdateTime: 2023-11-22T15:21:06Zmessage: Deployment does not have minimum availability.reason: MinimumReplicasUnavailablestatus: Falsetype: Available- lastTransitionTime: 2023-11-22T15:21:06ZlastUpdateTime: 2023-11-22T15:21:06Zmessage: pods prometheus-server-78bbb77dd7- is forbidden: error looking upservice account monitor-sa/monitor: serviceaccount monitor not foundreason: FailedCreatestatus: Truetype: ReplicaFailure- lastTransitionTime: 2023-11-22T15:31:07ZlastUpdateTime: 2023-11-22T15:31:07Zmessage: ReplicaSet prometheus-server-78bbb77dd7 has timed out progressing.reason: ProgressDeadlineExceededstatus: Falsetype: ProgressingobservedGeneration: 1unavailableReplicas: 2可以看到有三个message第一个是标题里提到的报错信息在dashboard里这个信息会优先显示如果是报错的时候第二个message是进一步解释错误问题在哪本例里是说有个名叫 monitor的sa没有找到第三个信息说的是这个deployment控制的rs部署失败此信息无关紧要了那么重要的是第二个信息这个信息是解决问题的关键。
附一个正常的deployment 的status
这个status告诉我们他是一个副本部署成功的因此第一个message是Deployment has minimum availability serviceAccount: kube-state-metricsserviceAccountName: kube-state-metricsterminationGracePeriodSeconds: 30
status:availableReplicas: 1conditions:- lastTransitionTime: 2023-11-21T14:56:14ZlastUpdateTime: 2023-11-21T14:56:14Zmessage: Deployment has minimum availability.reason: MinimumReplicasAvailablestatus: Truetype: Available- lastTransitionTime: 2023-11-21T14:56:13ZlastUpdateTime: 2023-11-21T14:56:14Zmessage: ReplicaSet kube-state-metrics-57794dcf65 has successfully progressed.reason: NewReplicaSetAvailablestatus: Truetype: ProgressingobservedGeneration: 1readyReplicas: 1replicas: 1updatedReplicas: 1具体的解决方案
根据以上报错信息那么我们就需要一个sa当然了如果不想给太高的权限就需要自己编写权限文件了这里我偷懒 使用cluster-admin具体的命令如下
[rootnode4 yaml]# k create sa -n monitor-sa monitor
serviceaccount/monitor created
[rootnode4 yaml]# k create clusterrolebinding monitor-clusterrolebinding -n monitor-sa --clusterrolecluster-admin --serviceaccountmonitor-sa:monitor
再次部署就成功了
[rootnode4 yaml]# k get po -n monitor-sa -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-exporter-6ttbl 1/1 Running 0 24h 192.168.123.12 node2 none none
node-exporter-7ls5t 1/1 Running 0 24h 192.168.123.11 node1 none none
node-exporter-r287q 1/1 Running 1 (2m57s ago) 24h 192.168.123.14 node4 none none
node-exporter-z85dm 1/1 Running 0 24h 192.168.123.13 node3 none none
prometheus-server-78bbb77dd7-6smlt 1/1 Running 0 20s 10.244.41.19 node4 none none
prometheus-server-78bbb77dd7-fhf5k 1/1 Running 0 20s 10.244.41.18 node4 none none总结来了
那么其实缺少sa可能会导致pod被隐藏可以得出sa是这个deployment的必要非显性依赖同样的如果部署文件内有写configmap但configmap并没有提前创建也会出现这种错误就是创建了deployment但pod创建不出来不像namespace没有提前创建的情况namespace是必要显性依赖没有会直接不让创建。
配额设置也是和sa一样的必要非显性依赖。
例如下面创建一个针对default这个命名空间的配额文件此文件定义如下
定义的内容为规定default命名空间下最多4个pods最多20个services只能使用10G的内存5.5的CPU
[rootnode4 yaml]# cat quota-nginx.yaml
apiVersion: v1
kind: ResourceQuota
metadata:name: quotanamespace: default
spec:hard:requests.cpu: 5.5limits.cpu: 5.5requests.memory: 10Gilimits.memory: 10Gipods: 4services: 20下面创建一个deployment副本是6个的nginx
[rootnode4 yaml]# cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:annotations:deployment.kubernetes.io/revision: 1creationTimestamp: 2023-11-22T16:13:33Zgeneration: 1labels:app: nginxname: nginxnamespace: defaultresourceVersion: 16411uid: e9a5cdc5-c6f0-45fb-a001-fcdd695eb925
spec:progressDeadlineSeconds: 600replicas: 6revisionHistoryLimit: 10selector:matchLabels:app: nginxstrategy:rollingUpdate:maxSurge: 25%maxUnavailable: 25%type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: nginxspec:containers:- image: nginx:1.18imagePullPolicy: IfNotPresentname: nginxresources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: Fileresources:limits:cpu: 1memory: 1Girequests:cpu: 500mmemory: 512MidnsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30创建完毕后发现只有四个pod配额有效
[rootnode4 yaml]# k get po
NAME READY STATUS RESTARTS AGE
nginx-54f9858f64-g65pk 1/1 Running 0 4m50s
nginx-54f9858f64-h42vf 1/1 Running 0 4m50s
nginx-54f9858f64-s776t 1/1 Running 0 4m50s
nginx-54f9858f64-wl7wz 1/1 Running 0 4m50s那么还有两个pod呢
[rootnode4 yaml]# k get deployments.apps nginx -oyaml |grep messagemessage: Deployment does not have minimum availability.message: pods nginx-54f9858f64-p8rxf is forbidden: exceeded quota: quota, requested:message: ReplicaSet nginx-54f9858f64 is progressing.那么解决的方法也很简单也就是调整quota啦怎么调整就不在这里废话了吧~~~~~~