从零搭建 K8s 集群 + Prometheus 监控 + Harbor 私有仓库 + 钉钉告警(RHEL 9 实战)
记录一下最近在 RHEL 9 环境下从零搭建 K8s 集群、Prometheus 监控体系、Harbor 私有镜像仓库的完整过程踩了不少坑全部记录下来。环境说明主机名IP角色系统Ubuntu192.168.137.128Prometheus Grafana AlertmanagerUbuntu 22.04k8s-master192.168.137.100K8s MasterRHEL 9.5k8s-node1192.168.137.101K8s NodeRHEL 9.5k8s-node2192.168.137.102K8s NodeRHEL 9.5k8s-node3192.168.137.103K8s NodeRHEL 9.5k8s-devops192.168.137.104Harbor 镜像仓库RHEL 9.5整体架构Ubuntu 监控主机 Prometheus(:9090) ──→ Grafana(:3000) Alertmanager(:9093) ──→ 钉钉群 │ │ 采集指标 ▼ K8s 集群 (master 3 node) nginx 微服务(3副本) Flannel 网络 │ │ 拉镜像 ▼ Harbor 私有仓库 (devops)第二段RHEL 基础环境一、RHEL 9 基础环境准备RHEL 9.5 没有注册订阅的话 yum 源是空的第一步要解决这个问题。1.1 配置 CentOS Stream 9 阿里云镜像源rm -f /etc/yum.repos.d/*.repo cat /etc/yum.repos.d/centos.repo EOF [baseos] nameCentOS Stream 9 - BaseOS baseurlhttps://mirrors.aliyun.com/centos-stream/9-stream/BaseOS/x86_64/os/ gpgcheck0 enabled1 [appstream] nameCentOS Stream 9 - AppStream baseurlhttps://mirrors.aliyun.com/centos-stream/9-stream/AppStream/x86_64/os/ gpgcheck0 enabled1 EOF yum clean all yum makecache1.2 关闭防火墙、SELinux、swapsystemctl stop firewalld systemctl disable firewalld setenforce 0 sed -i s/^SELINUXenforcing/SELINUXdisabled/ /etc/selinux/config swapoff -a sed -i /swap/d /etc/fstab1.3 配置 hostscat /etc/hosts EOF 192.168.137.100 k8s-master 192.168.137.101 k8s-node1 192.168.137.102 k8s-node2 192.168.137.103 k8s-node3 192.168.137.104 k8s-devops EOF1.4 加载内核模块和参数cat /etc/modules-load.d/k8s.conf EOF overlay br_netfilter EOF modprobe overlay modprobe br_netfilter cat /etc/sysctl.d/k8s.conf EOF net.bridge.bridge-nf-call-iptables 1 net.bridge.bridge-nf-call-ip6tables 1 net.ipv4.ip_forward 1 EOF sysctl --system1.5 安装 Dockeryum install -y yum-utils yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo yum install -y docker-ce docker-ce-cli containerd.io mkdir -p /etc/docker cat /etc/docker/daemon.json EOF { registry-mirrors: [https://mirror.ccs.tencentyun.com], exec-opts: [native.cgroupdriversystemd], storage-driver: overlay2 } EOF systemctl daemon-reload systemctl enable docker systemctl restart docker 截图docker version 输出以上步骤所有 K8s 节点都要执行。我写了 Shell 脚本通过 SSH 批量执行一次搞定 5 台机器。第三段K8s 集群搭建二、搭建 K8s 集群2.1 安装 kubeadmmaster 3 nodecat /etc/yum.repos.d/kubernetes.repo EOF [kubernetes] nameKubernetes baseurlhttps://mirrors.aliyun.com/kubernetes-new/core/stable/v1.30/rpm/ gpgcheck0 enabled1 EOF yum makecache yum install -y kubelet kubeadm kubectl2.2 配置 containerd这一步很关键不配的话 kubeadm init 会报required cgroups disabled。containerd config default /etc/containerd/config.toml sed -i s/SystemdCgroup false/SystemdCgroup true/ /etc/containerd/config.toml sed -i s|sandbox_image registry.k8s.io/pause:3.10.1|sandbox_image registry.aliyuncs.com/google_containers/pause:3.10| /etc/containerd/config.toml systemctl restart containerd systemctl enable kubelet提前拉取 pause 镜像并打标签crictl pull registry.aliyuncs.com/google_containers/pause:3.10 ctr -n k8s.io images tag registry.aliyuncs.com/google_containers/pause:3.10 registry.k8s.io/pause:3.10.12.3 初始化 Masterkubeadm init \ --apiserver-advertise-address192.168.137.100 \ --image-repositoryregistry.aliyuncs.com/google_containers \ --kubernetes-versionv1.30.14 \ --service-cidr10.96.0.0/12 \ --pod-network-cidr10.244.0.0/16配置 kubectlmkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config2.4 安装 Flannel 网络插件kubectl apply -f https://cdn.jsdelivr.net/gh/flannel-io/flannelmaster/Documentation/kube-flannel.yml 截图kubectl get nodes 显示 master Ready2.5 Node 加入集群每个 node 先配置 containerd 和 pause 镜像同 2.2然后执行kubeadm join 192.168.137.100:6443 --token token --discovery-token-ca-cert-hash sha256:hash 截图kubectl get nodes 显示 4 个节点全部 Ready第四段微服务部署Harbor三、部署微服务应用kubectl create namespace demo-app kubectl create deployment nginx-web --image192.168.137.104/library/nginx:1.25 --replicas3 -n demo-app kubectl expose deployment nginx-web --typeNodePort --port80 --target-port80 -n demo-app kubectl get pods -n demo-app -o wide kubectl get svc -n demo-app 截图3 个 Pod 全部 Running 截图浏览器显示 Nginx 欢迎页四、搭建 Harbor 私有镜像仓库4.1 安装 Docker Composecurl -L https://ghfast.top/https://github.com/docker/compose/releases/download/v2.27.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose chmod x /usr/local/bin/docker-compose4.2 下载安装 Harborcd /opt wget https://ghfast.top/https://github.com/goharbor/harbor/releases/download/v2.11.0/harbor-offline-installer-v2.11.0.tgz tar xf harbor-offline-installer-v2.11.0.tgz cd harbor cp harbor.yml.tmpl harbor.yml sed -i s/hostname: reg.mydomain.com/hostname: 192.168.137.104/ harbor.yml sed -i s/^https:/#https:/ harbor.yml sed -i s/^ port: 443/# port: 443/ harbor.yml sed -i s/^ certificate:/# certificate:/ harbor.yml sed -i s/^ private_key:/# private_key:/ harbor.yml ./install.sh 截图位置install.sh 执行完成所有容器 Started4.3 推送镜像到 Harbordocker login 192.168.137.104 -u admin -p Harbor12345 docker tag nginx:1.25 192.168.137.104/library/nginx:1.25 docker push 192.168.137.104/library/nginx:1.25 截图Harbor Web 界面显示 nginx 镜像4.4 K8s 对接 Harbor每个 K8s 节点配置 containerd 信任 Harbormkdir -p /etc/containerd/certs.d/192.168.137.104 cat /etc/containerd/certs.d/192.168.137.104/hosts.toml EOF server http://192.168.137.104 [host.http://192.168.137.104] capabilities [pull, resolve] skip_verify true EOF systemctl restart containerd更新 Deploymentkubectl set image deployment/nginx-web nginx192.168.137.104/library/nginx:1.25 -n demo-app第五段Prometheus 监控告警踩坑命令速查五、Prometheus 监控体系5.1 核心配置 prometheus.ymlglobal: scrape_interval: 15s scrape_configs: - job_name: prometheus static_configs: - targets: [localhost:9090] - job_name: ubuntu-monitor static_configs: - targets: [localhost:9100] - job_name: redhat-servers static_configs: - targets: - 192.168.137.100:9100 - 192.168.137.101:9100 - 192.168.137.102:9100 - 192.168.137.103:9100 - 192.168.137.104:9100 - job_name: k8s-nodes static_configs: - targets: - 192.168.137.100:10250 - 192.168.137.101:10250 - 192.168.137.102:10250 - 192.168.137.103:10250 scheme: https tls_config: insecure_skip_verify: true bearer_token_file: /etc/prometheus/k8s-token - job_name: k8s-apiserver static_configs: - targets: [192.168.137.100:6443] scheme: https tls_config: insecure_skip_verify: true bearer_token_file: /etc/prometheus/k8s-token5.2 告警规则groups: - name: host_alerts rules: - alert: HostDown expr: up 0 for: 1m labels: severity: critical annotations: summary: 主机 {{ $labels.instance }} 宕机 - alert: HighCPU expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{modeidle}[5m])) * 100) 85 for: 5m labels: severity: warning annotations: summary: {{ $labels.instance }} CPU超过85% - alert: HighMemory expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 90 for: 5m labels: severity: warning annotations: summary: {{ $labels.instance }} 内存超过90% - alert: DiskFull expr: (1 - node_filesystem_avail_bytes{fstype!tmpfs} / node_filesystem_size_bytes{fstype!tmpfs}) * 100 85 for: 5m labels: severity: warning annotations: summary: {{ $labels.instance }} 磁盘超过85%5.3 Grafana 配置apt-get install -y musl wget https://mirrors.tuna.tsinghua.edu.cn/grafana/apt/pool/main/g/grafana/grafana_11.1.0_amd64.deb dpkg -i grafana_11.1.0_amd64.deb systemctl enable grafana-server systemctl start grafana-server浏览器打开 http://192.168.137.128:3000admin/admin 登录Connections → Data Sources → Add → Prometheus → URL 填 http://localhost:9090 → Save TestDashboards → Import → 输入 1860 → Load → Import 截图Prometheus Targets 页面所有节点 UP 截图Grafana 面板显示 K8s 节点监控数据5.4 钉钉告警Alertmanager 配置route: group_by: [alertname] group_wait: 30s group_interval: 5m repeat_interval: 3h receiver: dingtalk receivers: - name: dingtalk webhook_configs: - url: http://localhost:8060/dingtalk/webhook1/send send_resolved: true 截图钉钉群收到 HostDown 告警消息六、踩坑记录问题原因解决方案RHEL 9 无 yum 源未注册订阅配置 CentOS Stream 9 阿里云镜像源kubeadm init 报 cgroups disabledcontainerd 未配置 SystemdCgroup修改 config.toml 设为 truepause 镜像拉取失败registry.k8s.io 不可达阿里云拉取 ctr images tag 改名Docker Hub 被墙国内网络限制Ubuntu 加速器拉取 → docker save → scp → ctr importHarbor 重启后服务停止重启 Docker 停掉所有容器docker-compose up -dDNS 解析失败resolv.conf 配置错误配置阿里云 DNS 223.5.5.5Windows 脚本报错换行符 \r\ndos2unix 转换七、常用命令速查kubectl get nodes kubectl get pods -n ns -o wide kubectl get svc -n ns kubectl create namespace name kubectl create deployment name --imageimg --replicasn -n ns kubectl expose deployment name --typeNodePort --port80 -n ns kubectl set image deployment/name containernew-image -n ns kubectl delete pods --all -n ns kubectl logs pod -n ns kubectl describe pod pod -n ns docker pull / tag / push / save / load docker login harbor-ip -u admin -p Harbor12345 crictl pull image ctr -n k8s.io images import file.tar ctr -n k8s.io images tag old new systemctl start/stop/restart/status/enable service promtool check config /etc/prometheus/prometheus.yml
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2567634.html
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!