k8s监控方案实践补充(一):部署Metrics Server实现kubectl top和HPA支持
文章目录
- k8s监控方案实践补充(一):部署Metrics Server实现kubectl top和HPA支持
- 一、Metrics Server简介
- 二、Metrics Server实战部署
- 1. 创建RBAC(metrics-server-rbac.yaml)
- 2. 创建Service(metrics-server-svc.yaml)
- 3. 创建Deployment(metrics-server-deploy.yaml)
- 4. 创建APIService(metrics-server-apiservice.yaml)
- 5. 部署所有资源
- 三、配置Prometheus抓取资源指标配置
- 总结
随着容器化和微服务架构的不断发展,系统的复杂性与日俱增,构建一套完善的监控与资源管理体系已成为保障系统稳定运行的关键。在前几篇文章中,我们已经介绍了如何部署 Prometheus、Node Exporter、Grafana 以及 Alertmanager,并通过钉钉 Webhook 实现了监控告警的闭环。
在本篇补充文章中,我们将部署 Kubernetes 原生的资源指标采集组件 —— Metrics Server。它是实现 kubectl top
命令、自动水平扩缩容(HPA)等关键功能的基础,为进一步增强集群资源可观测性和智能调度能力提供支持。
一、Metrics Server简介
Metrics Server 是 Kubernetes 官方提供的资源指标聚合组件,主要用于收集各节点和各 Pod 的 CPU 与内存使用情况。它通过调用 Kubelet 的 Summary API 聚合数据,并将指标存储在内存中(不持久化),供 API Server 查询。
部署 Metrics Server 后,可以实现以下功能:
- 使用
kubectl top
命令实时查看节点和 Pod 的资源使用情况 - 为 HPA(Horizontal Pod Autoscaler)提供基础指标支撑,实现基于资源使用的自动扩缩容
- 在某些 Kubernetes 仪表盘中显示资源使用情况(如 Kubernetes Dashboard)
⚠️ 需要注意的是,Metrics Server 并不会将数据持久化,也不支持 Prometheus 查询语法,它只适用于实时性要求高但不需要历史数据的场景。
二、Metrics Server实战部署
1. 创建RBAC(metrics-server-rbac.yaml)
为 Metrics Server 分配所需的访问权限,包括读取节点、Pod 等资源指标,并配置相应的 ServiceAccount 与 RoleBinding
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
2. 创建Service(metrics-server-svc.yaml)
暴露 Metrics Server 的 HTTPS 服务端口,供 Kubernetes API Server 注册并访问其指标服务
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
3. 创建Deployment(metrics-server-deploy.yaml)
部署 Metrics Server,配置启动参数、TLS 端口、探针、ServiceAccount 以及临时目录等关键运行参数
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
image: harbor.local/k8s/metrics-server:0.4.3
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
periodSeconds: 10
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
4. 创建APIService(metrics-server-apiservice.yaml)
注册 metrics.k8s.io
资源组的 v1beta1 版本,使 Kubernetes 能够通过标准 API 查询 Metrics Server 提供的实时指标
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100
5. 部署所有资源
kubectl apply -f 01-metrics-server-rbac.yaml
kubectl apply -f 02-metrics-server-service.yaml
kubectl apply -f 03-metrics-server-deployment.yaml
kubectl apply -f 04-metrics-server-apiservice.yaml
三、配置Prometheus抓取资源指标配置
⚠️ 注意:Prometheus 不直接支持从 Metrics Server 抓取指标,但可以从 Kubelet 的 cAdvisor 路径采集节点与容器资源使用情况。
- job_name: 'kubernetes-node-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
总结
🚀 本篇文章补充了 Kubernetes 原生监控能力的关键组件 —— Metrics Server 的部署过程,解决了 kubectl top 无法使用的问题,并为 HPA 自动扩缩容提供资源指标支持。
✅下一篇补充文章将继续完善监控体系,介绍如何部署 kube-state-metrics,用于采集 Kubernetes 对象状态(如 Deployment、Pod、Node 等)的关键指标,为 Prometheus 提供结构化的集群状态数据支撑。