广州高铁新建站在哪里,wordpress 免费主题推荐,如何做网络销售产品,开发app需要什么技术架构说明#xff1a;
prometheus是云原生系统内的事实上的监控标准#xff0c;而kubernetes集群内部自然还是需要就地取材的部署prometheus服务了
那么#xff0c;prometheus-server部署的方式其实是非常多的#xff0c;比如#xff0c;kubesphere集成方式#xff0c;h…架构说明
prometheus是云原生系统内的事实上的监控标准而kubernetes集群内部自然还是需要就地取材的部署prometheus服务了
那么prometheus-server部署的方式其实是非常多的比如kubesphere集成方式helm包方式yaml文件清单方式all in one 方式在本例中选择使用yaml文件清单方式
部署前需要考虑一个问题那就是prometheus-server的时序数据库的数据存储问题在本例中使用的是本地目录挂载方式也就是host本地挂载挂载目录 /data
kubernetes集群的版本如下(1.23.16版本3master1个工作节点部署方式为kubekey):
[rootnode4 yaml]# k get no -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node1 Ready control-plane,master 10d v1.23.16 192.168.123.11 none CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8
node2 Ready control-plane,master 10d v1.23.16 192.168.123.12 none CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8
node3 Ready control-plane,master 10d v1.23.16 192.168.123.13 none CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8
node4 Ready worker 10d v1.23.16 192.168.123.14 none CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://20.10.8prometheus-server的版本为v2.2.1
[rootnode4 yaml]# k get deployments.apps -n monitor-sa -owide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
prometheus-server 2/2 2 2 9d prometheus prom/prometheus:v2.2.1 appprometheus,componentservergrafana的版本为rpm 方式安装的9.4.3
[rootnode4 yaml]# rpm -qa |grep grafana
grafana-enterprise-9.4.3-1.x86_64node-exporter的版本为v0.16damonsets控制器
[rootnode4 yaml]# k get ds -n monitor-sa -owide
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
node-exporter 4 4 4 4 4 none 10d node-exporter prom/node-exporter:v0.16.0 namenode-exporter部署成功的pod状态如下
[rootnode4 yaml]# k get po -n monitor-sa
NAME READY STATUS RESTARTS AGE
node-exporter-6ttbl 1/1 Running 1 (77m ago) 10d
node-exporter-7ls5t 1/1 Running 1 (76m ago) 10d
node-exporter-r287q 1/1 Running 3 (77m ago) 10d
node-exporter-z85dm 1/1 Running 1 (77m ago) 10d
prometheus-server-fb59774d6-bgmn7 1/1 Running 0 62m
prometheus-server-fb59774d6-wrq27 1/1 Running 0 62m下面就如何在kubernetes内 部署一个prometheus做一个介绍
一
node-exporter的部署
这里需要说明一下node-exporter是做数据收集工作的因此如何收集数据哪些数据需要收集哪些数据需要舍弃这些是应该考虑的虽然exporter只是收集数据数据并不主动推送到prometheus而是由prometheus自己来抓取因此无需配置存储但如果node-exporter什么数据都收集那毫无疑问的对prometheus会是一种负担。
本例中相关配置是表示磁盘挂载点的信息不收集
- --collector.filesystem.ignored-mount-points - ^/(sys|proc|dev|host|etc)($|/)
prometheus的优化部分根据以下内容配置 --collector.arp 启用 arp 收集器默认值启用。 --collector.bcache 启用 bcache 收集器默认值启用。 --collector.bonding 启用绑定收集器默认值启用。 --collector.btrfs 启用 btrfs 收集器默认值启用。 --collector.buddyinfo 启用 buddyinfo 收集器默认值禁用。 --collector.conntrack 启用 conntrack 收集器默认值启用。 --collector.cpu 启用 CPU 收集器默认值启用。 --collector.cpufreq 启用 cpufreq 收集器默认值启用。 --collector.diskstats 启用 diskstats 收集器默认值启用。 --collector.drbd 启用 drbd 收集器默认值禁用。 --collector.edac 启用 edac 收集器默认值启用。 --collector.entropy 启用熵收集器默认值启用。 --collector.ethtool 启用 ethtool 收集器默认值禁用。 --collector.fiberchannel 启用光纤通道收集器默认值启用。 --collector.filefd 启用 filefd 收集器默认值启用。 --collector.filesystem 启用文件系统收集器默认值启用。 --collector.hwmon 启用 hwmon 收集器默认值启用。 --collector.infiniband 启用 infiniband 收集器默认值启用。 --collector.interrupts 启用中断收集器默认值禁用。 --collector.ipvs 启用 ipvs 收集器默认值启用。 --collector.ksmd 启用 ksmd 收集器默认值禁用。 --collector.loadavg 启用 loadavg 收集器默认值启用。 --collector.logind 启用登录收集器默认值禁用。 --collector.mdadm 启用 mdadm 收集器默认值启用。 --collector.meminfo 启用 meminfo 收集器默认值启用。 --collector.meminfo_numa 启用 meminfo_numa 收集器默认值禁用。 --collector.mountstats 启用 mountstats 收集器默认值禁用。 --collector.netclass 启用网络类收集器默认启用。 --collector.netdev 启用 netdev 收集器默认值启用。 --collector.netstat 启用 netstat 收集器默认值启用。 --collector.network_route 启用 network_route 收集器默认值禁用。 --collector.nfs 启用 nfs 收集器默认值启用。 --collector.nfsd 启用 nfsd 收集器默认值启用。 --collector.ntp 启用 ntp 收集器默认值禁用。 --collector.nvme 启用 nvme 收集器默认值启用。 --collector.perf 启用性能收集器默认值禁用。 --collector.powersupplyclass 启用 powersupplyclass 收集器默认值启用。 --collector.pressure 启用压力收集器默认值启用。 --collector.processes 启用进程收集器默认值禁用。 --collector.qdisc 启用 qdisc 收集器默认值禁用。 --collector.rapl 启用 rapl 收集器默认值启用。 --collector.runit 启用 runit 收集器默认值禁用。 --collector.schedstat 启用 schedstat 收集器默认值启用。 --collector.sockstat 启用 sockstat 收集器默认值启用。 --collector.softnet 启用软网络收集器默认值启用。 --collector.stat 启用统计收集器默认值启用。 --collector.supervisord 启用 supervisord 收集器默认值禁用。 --collector.systemd 启用 systemd 收集器默认值禁用。 --collector.tapestats 启用tapestats 收集器默认值启用。 --collector.tcpstat 启用 tcpstat 收集器默认值禁用。 --collector.textfile 启用文本文件收集器默认值启用。 --collector.thermal_zone 启用热区收集器默认值启用。 --collector.time 启用时间收集器默认启用。 --collector.timex 启用 timex 收集器默认值启用。 --collector.udp_queues 启用 udp_queues 收集器默认值启用。 --collector.uname 启用 uname 收集器默认值启用。 --collector.vmstat 启用 vmstat 收集器默认值启用。 --collector.wifi 启用 wifi 收集器默认值禁用。 --collector.xfs 启用 xfs 收集器默认值启用。 --collector.zfs 启用 zfs 收集器默认值启用。 --collector.zoneinfo 启用 zoneinfo 收集器默认值禁用。 Example: --collector.filesystem.mount-points-exclude^/(dev|proc|sys|var/lib/docker/.|var/lib/kubelet/.)($|/) List: CollectorScopeInclude FlagExclude Flagarpdevice--collector.arp.device-include--collector.arp.device-excludecpubugs--collector.cpu.info.bugs-includeN/Acpuflags--collector.cpu.info.flags-includeN/Adiskstatsdevice--collector.diskstats.device-include--collector.diskstats.device-excludeethtooldevice--collector.ethtool.device-include--collector.ethtool.device-excludeethtoolmetrics--collector.ethtool.metrics-includeN/Afilesystemfs-typesN/A--collector.filesystem.fs-types-excludefilesystemmount-pointsN/A--collector.filesystem.mount-points-excludehwmonchip--collector.hwmon.chip-include--collector.hwmon.chip-excludenetdevdevice--collector.netdev.device-include--collector.netdev.device-excludeqdiskdevice--collector.qdisk.device-include--collector.qdisk.device-excludesysctlall--collector.sysctl.includeN/Asystemdunit--collector.systemd.unit-include--collector.systemd.unit-exclude Enabled by default NameDescriptionOSarpExposes ARP statistics from /proc/net/arp.LinuxbcacheExposes bcache statistics from /sys/fs/bcache/.LinuxbondingExposes the number of configured and active slaves of Linux bonding interfaces.LinuxbtrfsExposes btrfs statisticsLinuxboottimeExposes system boot time derived from the kern.boottime sysctl.Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, SolarisconntrackShows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present).LinuxcpuExposes CPU statisticsDarwin, Dragonfly, FreeBSD, Linux, Solaris, OpenBSDcpufreqExposes CPU frequency statisticsLinux, SolarisdiskstatsExposes disk I/O statistics.Darwin, Linux, OpenBSDdmiExpose Desktop Management Interface (DMI) info from /sys/class/dmi/id/LinuxedacExposes error detection and correction statistics.LinuxentropyExposes available entropy.LinuxexecExposes execution statistics.Dragonfly, FreeBSDfibrechannelExposes fibre channel information and statistics from /sys/class/fc_host/.LinuxfilefdExposes file descriptor statistics from /proc/sys/fs/file-nr.LinuxfilesystemExposes filesystem statistics, such as disk space used.Darwin, Dragonfly, FreeBSD, Linux, OpenBSDhwmonExpose hardware monitoring and sensor data from /sys/class/hwmon/.LinuxinfinibandExposes network statistics specific to InfiniBand and Intel OmniPath configurations.LinuxipvsExposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats.LinuxloadavgExposes load average.Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, SolarismdadmExposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present).LinuxmeminfoExposes memory statistics.Darwin, Dragonfly, FreeBSD, Linux, OpenBSDnetclassExposes network interface info from /sys/class/net/LinuxnetdevExposes network interface statistics such as bytes transferred.Darwin, Dragonfly, FreeBSD, Linux, OpenBSDnetisrExposes netisr statisticsFreeBSDnetstatExposes network statistics from /proc/net/netstat. This is the same information as netstat -s.LinuxnfsExposes NFS client statistics from /proc/net/rpc/nfs. This is the same information as nfsstat -c.LinuxnfsdExposes NFS kernel server statistics from /proc/net/rpc/nfsd. This is the same information as nfsstat -s.LinuxnvmeExposes NVMe info from /sys/class/nvme/LinuxosExpose OS release info from /etc/os-release or /usr/lib/os-releaseanypowersupplyclassExposes Power Supply statistics from /sys/class/power_supplyLinuxpressureExposes pressure stall statistics from /proc/pressure/.Linux (kernel 4.20 and/or CONFIG_PSI)raplExposes various statistics from /sys/class/powercap.LinuxschedstatExposes task scheduler statistics from /proc/schedstat.LinuxselinuxExposes SELinux statistics.LinuxsockstatExposes various statistics from /proc/net/sockstat.LinuxsoftnetExposes statistics from /proc/net/softnet_stat.LinuxstatExposes various statistics from /proc/stat. This includes boot time, forks and interrupts.LinuxtapestatsExposes statistics from /sys/class/scsi_tape.LinuxtextfileExposes statistics read from local disk. The --collector.textfile.directory flag must be set.anythermalExposes thermal statistics like pmset -g therm.Darwinthermal_zoneExposes thermal zone cooling device statistics from /sys/class/thermal.LinuxtimeExposes the current system time.anytimexExposes selected adjtimex(2) system call stats.Linuxudp_queuesExposes UDP total lengths of the rx_queue and tx_queue from /proc/net/udp and /proc/net/udp6.LinuxunameExposes system information as provided by the uname system call.Darwin, FreeBSD, Linux, OpenBSDvmstatExposes statistics from /proc/vmstat.LinuxxfsExposes XFS runtime statistics.Linux (kernel 4.4)zfsExposes ZFS performance statistics.FreeBSD, Linux, Solaris node-exporter的部署文件
cat node-export.yaml EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:name: node-exporternamespace: monitor-salabels:name: node-exporter
spec:selector:matchLabels:name: node-exportertemplate:metadata:labels:name: node-exporterspec:hostPID: truehostIPC: truehostNetwork: truecontainers:- name: node-exporterimage: prom/node-exporter:v0.16.0ports:- containerPort: 9100resources:requests:cpu: 0.15securityContext:privileged: trueargs:- --path.procfs- /host/proc- --path.sysfs- /host/sys- --collector.filesystem.ignored-mount-points- ^/(sys|proc|dev|host|etc)($|/)volumeMounts:- name: devmountPath: /host/dev- name: procmountPath: /host/proc- name: sysmountPath: /host/sys- name: rootfsmountPath: /rootfstolerations:- key: node-role.kubernetes.io/masteroperator: Existseffect: NoSchedulevolumes:- name: prochostPath:path: /proc- name: devhostPath:path: /dev- name: syshostPath:path: /sys- name: rootfshostPath:path: /
EOF
二
kube-state-metrics收集器的部署
kube-state-metrics是kubernetes内部专门收集poddeploymentdssts等等资源的状态的收集器该收集器收集到的数据由prometheus-server 服务自己主动来抓取
例如我们查询该服务的日志可以看到有一些资源它没有收集到原因是sa权限不足但这些无需担心和node-exporter一样某些数据我们是并不需要收集的
E1202 13:10:33.591335 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.Secret: secrets is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource secrets in API group at the cluster scope
E1202 13:10:33.592118 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.MutatingWebhookConfiguration: mutatingwebhookconfigurations.admissionregistration.k8s.io is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource mutatingwebhookconfigurations in API group admissionregistration.k8s.io at the cluster scope
E1202 13:10:33.593079 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.Namespace: networkpolicies.networking.k8s.io is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource networkpolicies in API group networking.k8s.io at the cluster scope
E1202 13:10:33.597030 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource replicasets in API group apps at the cluster scope
E1202 13:10:33.599890 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.ValidatingWebhookConfiguration: validatingwebhookconfigurations.admissionregistration.k8s.io is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource validatingwebhookconfigurations in API group admissionregistration.k8s.io at the cluster scope
E1202 13:10:34.580372 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.StorageClass: storageclasses.storage.k8s.io is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource storageclasses in API group storage.k8s.io at the cluster scope
E1202 13:10:34.580373 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.ConfigMap: configmaps is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource configmaps in API group at the cluster scope
E1202 13:10:34.586583 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource poddisruptionbudgets in API group policy at the cluster scope
E1202 13:10:34.586669 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1.Deployment: deployments.apps is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource deployments in API group apps at the cluster scope
E1202 13:10:34.587055 1 reflector.go:156] pkg/mod/k8s.io/client-gov0.0.0-20191109102209-3c0d1af94be5/tools/cache/reflector.go:108: Failed to list *v1beta1.VolumeAttachment: volumeattachments.storage.k8s.io is forbidden: User system:serviceaccount:kube-system:kube-state-metrics cannot list resource volumeattachments in API group storage.k8s.io at the cluster scopekube-state-metrics的RBAC
这里上面的缺的收集cm的权限我已经补上了
cat kube-state-metrics-rbac.yaml EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:name: kube-state-metricsnamespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: kube-state-metrics
rules:
- apiGroups: []resources: [nodes, pods, services, resourcequotas, replicationcontrollers, limitranges, persistentvolumeclaims, persistentvolumes, namespaces, endpoints]verbs: [list, watch]
- apiGroups: [extensions]resources: [daemonsets, deployments, replicasets]verbs: [list, watch]
- apiGroups: [apps]resources: [statefulsets,daemonsets,replicasets,deployments]verbs: [list, watch]
- apiGroups: [batch]resources: [cronjobs, jobs]verbs: [list, watch]
- apiGroups: [autoscaling]resources: [horizontalpodautoscalers]verbs: [list, watch]
- apiGroups: []resources: [configmaps,secrets]verbs: [list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: kube-state-metrics
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: kube-state-metrics
subjects:
- kind: ServiceAccountname: kube-state-metricsnamespace: kube-system
EOF kube-state-metrics的svc
这里有一个注解prometheus.io/scrape: true 表示允许prometheus收集数据
cat kube-state-metrics-svc.yaml EOF
apiVersion: v1
kind: Service
metadata:annotations:prometheus.io/scrape: truename: kube-state-metricsnamespace: kube-systemlabels:app: kube-state-metrics
spec:ports:- name: kube-state-metricsport: 8080protocol: TCPselector:app: kube-state-metrics
EOF
kube-state-metrics的deployment
cat kube-state-metrics-deploy.yaml EOF
apiVersion: apps/v1
kind: Deployment
metadata:name: kube-state-metricsnamespace: kube-system
spec:replicas: 1selector:matchLabels:app: kube-state-metricstemplate:metadata:labels:app: kube-state-metricsspec:serviceAccountName: kube-state-metricscontainers:- name: kube-state-metrics
# image: gcr.io/google_containers/kube-state-metrics-amd64:v1.3.1image: quay.io/coreos/kube-state-metrics:v1.9.0ports:- containerPort: 8080
EOF
三
prometheus-server的部署
1
prometheus-svc
cat prometheus-cfg.yaml EOF
---
kind: ConfigMap
apiVersion: v1
metadata:labels:app: prometheusname: prometheus-confignamespace: monitor-sa
data:prometheus.yml: |global:scrape_interval: 15sscrape_timeout: 10sevaluation_interval: 1mscrape_configs:- job_name: kubernetes-nodekubernetes_sd_configs:- role: noderelabel_configs:- source_labels: [__address__]regex: (.*):10250replacement: ${1}:9100target_label: __address__action: replace- action: labelmapregex: __meta_kubernetes_node_label_(.)- job_name: kubernetes-node-cadvisorkubernetes_sd_configs:- role: nodescheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenrelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.)- target_label: __address__replacement: kubernetes.default.svc:443- source_labels: [__meta_kubernetes_node_name]regex: (.)target_label: __metrics_path__replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor- job_name: kubernetes-apiserverkubernetes_sd_configs:- role: endpointsscheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenrelabel_configs:- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]action: keepregex: default;kubernetes;https
EOF
2,
prometheus-svc
cat prometheus-svc.yaml EOF
---
apiVersion: v1
kind: Service
metadata:name: prometheusnamespace: monitor-salabels:app: prometheus
spec:type: NodePortports:- port: 9090targetPort: 9090protocol: TCPselector:app: prometheuscomponent: server
EOF
3,
cat prometheus-deploy.yaml EOF
---
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheus-servernamespace: monitor-salabels:app: prometheus
spec:replicas: 2selector:matchLabels:app: prometheuscomponent: server#matchExpressions:#- {key: app, operator: In, values: [prometheus]}#- {key: component, operator: In, values: [server]}template:metadata:labels:app: prometheuscomponent: serverannotations:prometheus.io/scrape: falsespec:nodeName: node4serviceAccountName: monitorcontainers:- name: prometheusimage: prom/prometheus:v2.2.1imagePullPolicy: IfNotPresentcommand:- prometheus- --config.file/etc/prometheus/prometheus.yml- --storage.tsdb.path/prometheus- --storage.tsdb.retention720hports:- containerPort: 9090protocol: TCPvolumeMounts:- mountPath: /etc/prometheus/prometheus.ymlname: prometheus-configsubPath: prometheus.yml- mountPath: /prometheus/name: prometheus-storage-volumevolumes:- name: prometheus-configconfigMap:name: prometheus-configitems:- key: prometheus.ymlpath: prometheus.ymlmode: 0644- name: prometheus-storage-volumehostPath:path: /datatype: Directory
EOF
以上所有部署执行完毕后查看prometheus-server的svc
[rootnode4 yaml]# k get svc -n monitor-sa
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus NodePort 10.96.0.120 none 9090:32661/TCP 10d根据该port打开浏览器进入prometheus的web界面 至此kubernetes集群内的prometheus-server服务就安装完毕了
grafana默认安装就可以了rpm方式安装没什么好说的主要是数据源设置如下