78. RKE2 集群配置失败,由于无法解析 localhost,导致 kube-apiserver 健康检查失败
Environment 环境Rancher v2.6 牧场主 v2.6A Rancher-provisioned RKE2 cluster一个由牧场者配置的 RKE2 集群Situation 地理位置There are a high number of restarts for cluster component Pods in the affected downstream RKE2 cluster:受影响的下游 RKE2 集群中集群组件 Pod 的重启次数较多NAMESPACE NAME READY STATUS RESTARTS cattle-fleet-system fleet-agent-cc8c97f97-bvx78 1/1 Running 185 cattle-system cattle-cluster-agent-b1460cbd-8ct5c 1/1 Running 115 cattle-system cattle-cluster-agent-b1460cbd-l2l8l 1/1 Running 168 kube-system kube-apiserver-cluster-suse-cp-f777105c-2qgvh 0/1 Running 314 kube-system kube-controller-manager-cluster-suse-cp-5c-2qgvh 1/1 Running 491 kube-system cloud-controller-manager-cluster-suse-cp-5c-2qgvh 1/1 Running 501The kube-apiserver Pod flaps between a ready and not ready status:kube-apiserver Pod 在准备状态和未准备好状态之间摇摆NAMESPACE NAME READY STATUS RESTARTS kube-system kube-apiserver-cluster-suse-cp-f777105c-2qgvh 0/1 Running 314The kubelet logs register failing probes against the kube-apiserver.kubelet 日志会对 kube-apiserver 进行检测失败。Resolution 结局Enable kubelet debug logging启用 kubelet 调试日志Navigate toCluster Management导航至集群管理ClickEdit Configfor the affected downstream RKE2 cluster点击“编辑配置”以查看受影响的下游 RKE2 集群Click theAdvancedtab in theCluster Configurationform点击集群配置表单中的高级标签UnderAdditional Kubelet ArgsclickAdd Global Argument在“额外 Kubelet Args”下点击添加全局参数In the new argument field enter v9在新的参数字段中输入 v9ClickSave点击保存Replicate the liveness probe and check the kubelet logs复制活性探针并检查 kubelet 日志Open an SSH session to a master node in the affected RKE2 downstream cluster在受影响的 RKE2 下游集群中向主节点开启 SSH 会话Check the kubelet log (tail -f /var/lib/rancher/rke2/agent/logs/kubelet.log | grep kube-apiserver) for failing kube-apiserver liveness probes检查 kubelet 日志tail -f /var/lib/rancher/rke2/agent/logs/kubelet.log | grep kube-apiserver查找失败的 kube-apiserver liveness probesExecute the following command to simulate the liveness probe for the kube-apiserver Pod, which should fail, if encountering the issue:执行以下命令来模拟 kube-apiserver Pod 的活性探测如果遇到问题该探测应该会失败/var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock exec $(/var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock ps | grep kube-apiserver | awk {print $1}) kubectl get --serverhttps://localhost:6443/ --client-certificate/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt --client-key/var/lib/rancher/rke2/server/tls/client-kube-apiserver.key --certificate-authority/var/lib/rancher/rke2/server/tls/server-ca.crt --raw/livezPerform the simulated liveness probe for the kube-apiserver again, replacing localhost with 127.0.0.1, which should succeed:再次对 kube-apiserver 进行模拟的活体探测将 localhost 替换为 127.0.0.1应该能成功/var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock exec $(/var/lib/rancher/rke2/bin/crictl --runtime-endpoint unix:///run/k3s/containerd/containerd.sock ps | grep kube-apiserver | awk {print $1}) kubectl get --serverhttps://127.0.0.1:6443/ --client-certificate/var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt --client-key/var/lib/rancher/rke2/server/tls/client-kube-apiserver.key --certificate-authority/var/lib/rancher/rke2/server/tls/server-ca.crt --raw/livezFix the host or host template, to ensure a valid /etc/hosts file is present, with an entry mapping localhost to 127.0.0.1, as expected.修复主机或主机模板确保存在有效的 /etc/hosts 文件并按预期将 localhost 映射到 127.0.0.1。Cause 病因The /etc/hosts file on the node was empty and did not contain any localhost references, causing DNS resolution failures for the kube-apiserver liveness probes to localhost.节点上的 /etc/hosts 文件是空的且不包含任何 localhost 引用导致 kube-apiserver liveness 探测器向 localhost 的 DNS 解析失败。
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2445287.html
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!