龙虾白嫖指南,请查收~
故障表现发现请求集群 demo 入口时卡住并且对应 Pod 没有新的日志输出rootce-demo-1:~# kubectl get pods -n deepflow-otel-spring-demo -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESdb-demo-0 1/1 Running 0 2d1h 10.244.81.203 ce-demo-3nacos-0 1/1 Running 0 2d1h 10.244.142.37 ce-demo-1svc-item-588b4cfcc9-zkvpd 1/1 Running 0 2d1h 10.244.142.36 ce-demo-1svc-order-5f67c67555-ds7bj 1/1 Running 0 2d1h 10.244.228.140 ce-demo-2svc-stock-b9df64d4b-bsxs5 1/1 Running 0 2d1h 10.244.142.38 ce-demo-1svc-user-7c5c7b488f-4zjdc 1/1 Running 0 2d1h 10.244.81.204 ce-demo-3web-shop-5d495d8cbc-lnpxq 1/1 Running 0 2d1h 10.244.228.139 ce-demo-2rootce-demo-1:~# curl 10.244.228.139:8090/shop/full-test## 等待很久后报错curl: (28) Failed to connect to 10.244.228.139 port 8090 after 133345 ms: Could not connect to server## 此处新开终端查看日志后再次请求,发现没有新的日志rootce-demo-1:~# kubectl logs -f -n deepflow-otel-spring-demo web-shop-5d495d8cbc-lnpxq --tail 20排查流程在客户端抓包发现请求后一直建连失败从客户端 IP 看出识别到 Server IP 是 Pod IP所以直接走了 calico 网口转发imagetcpdump -v -i any dst 10.244.228.139 -w pod.pcaprootce-demo-1:~# ip a s vxlan.calico30499: vxlan.calico: mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000link/ether 66:e0:bb:93:52:4f brd ff:ff:ff:ff:ff:ffinet 10.244.142.0/32 scope global vxlan.calicovalid_lft forever preferred_lft forever由上数据怀疑是 ce-demo-2 节点 calico 问题查看后发现对应节点 calico-node 运行异常此组件作用可参考官网 calico/node 配置说明 和 calico 组件架构rootce-demo-1:~# kubectl get pods -n calico-system -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATEScalico-kube-controllers-86596856c4-w7nsl 1/1 Running 0 2d9h 10.244.228.132 ce-demo-2calico-node-n2b4b 1/1 Running 0 2d9h 10.51.0.102 ce-demo-3calico-node-p4k7s 1/1 Running 0 2d9h 10.51.0.100 ce-demo-1calico-node-sbxrk 0/1 Running 0 44h 10.51.0.101 ce-demo-2calico-typha-75c74d6ffd-6xq2j 1/1 Running 0 2d9h 10.51.0.100 ce-demo-1calico-typha-75c74d6ffd-fkf6b 1/1 Running 0 2d9h 10.51.0.102 ce-demo-3csi-node-driver-72pkg 2/2 Running 0 2d9h 10.244.142.1 ce-demo-1csi-node-driver-8sjvc 2/2 Running 0 2d9h 10.244.81.195 ce-demo-3csi-node-driver-skj6v 2/2 Running 0 2d9h 10.244.228.130 ce-demo-2Calico 启动 calico-node 时会根据主机的网络接口自动选择一个 IP 地址作为节点的 IPv4Address默认是自动探测autodetect。查看三台节点对应的值发现此 ip 不是对应节点的 ens160rootce-demo-1:~# kubectl get node -o yaml | grep IPv4Addressprojectcalico.org/IPv4Address: 10.51.0.100/24projectcalico.org/IPv4Address: 10.4.0.1/24projectcalico.org/IPv4Address: 10.51.0.102/24ce-demo-2 节点查看 10.1.0.1 对应 ip 网络设备的详细信息可以看出这是一个网桥bridge设备用于容器间通信/外部访问rootce-demo-2:~# ip address show nerdctl0686: nerdctl0: mtu 1500 qdisc noqueue state UP group default qlen 1000link/ether b2:90:e4:5b:f4:d1 brd ff:ff:ff:ff:ff:ffinet 10.4.0.1/24 brd 10.4.0.255 scope global nerdctl0valid_lft forever preferred_lft foreverrootce-demo-2:~# ip -d link show nerdctl0686: nerdctl0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000link/ether b2:90:e4:5b:f4:d1 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.b2:90:e4:5b:f4:d1 designated_root 8000.b2:90:e4:5b:f4:d1 root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 240.02 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 1 no_linklocal_learn 0 mcast_vlan_snooping 0 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3125 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536rootce-demo-2:~# brctl show nerdctl0bridge name bridge id STP enabled interfacesnerdctl0 8000.b290e45bf4d1 no veth76f5219bCalico 默认配置通过 first-found 模式从所有接口中第一个有 IP 的接口选一个从 ce-demo-2 节点 calico-node log time 看出首先看到的是 nerdctl0接口遍历顺序由宿主机内核/Netlink 返回的接口顺序决定常见情况下与接口 ifindex创建顺序/系统里登记的编号相关如果 nerdctl0 在系统里排序靠前例如创建得更早或者 ifindex 更靠前first-found 就会先看到它。rootce-demo-1:~# kubectl describe daemonset calico-node -n calico-system | grep IP_AUTODETECTION_METHODIP_AUTODETECTION_METHOD: first-foundrootce-demo-1:~# kubectl logs -n calico-system calico-node-sbxrk -c calico-node | grep -i nerdctl02025-07-15 07:51:58.079 [INFO][9] startup/autodetection_methods.go 103: Using autodetected IPv4 address on interface nerdctl0: 10.4.0.1/242025-07-15 07:52:02.351 [INFO][87] felix/int_dataplane.go 1431: Linux interface state changed. ifIndex686 ifaceNamenerdctl0 stateup2025-07-15 07:52:02.351 [INFO][87] felix/int_dataplane.go 1475: Linux interface addrs changed. addrsset.Set{10.4.0.1} ifaceNamenerdctl02025-07-15 07:52:02.351 [INFO][87] felix/int_dataplane.go 2098: Received interface update msgintdataplane.ifaceStateUpdate{Name:nerdctl0, State:up, Index:686}2025-07-15 07:52:02.351 [INFO][87] felix/int_dataplane.go 2125: Received interface addresses update msgintdataplane.ifaceAddrsUpdate{Name:nerdctl0, Addrs:set.Typed[string]{10.4.0.1:set.v{}}}2025-07-15 07:52:02.351 [INFO][87] felix/hostip_mgr.go 84: Interface addrs changed. updateintdataplane.ifaceAddrsUpdate{Name:nerdctl0, Addrs:set.Typed[string]{10.4.0.1:set.v{}}}2025-07-15 07:52:02.413 [INFO][87] felix/vxlan_mgr.go 597: VXLAN device parent changed from to nerdctl0 ipVersion0x4rootce-demo-1:~# kubectl logs -n calico-system calico-node-sbxrk -c calico-node | grep -i ens1602025-07-15 07:52:02.348 [INFO][87] felix/int_dataplane.go 1431: Linux interface state changed. ifIndex2 ifaceNameens160 stateup2025-07-15 07:52:02.348 [INFO][87] felix/int_dataplane.go 1475: Linux interface addrs changed. addrsset.Set{10.51.0.101} ifaceNameens1602025-07-15 07:52:02.349 [INFO][87] felix/int_dataplane.go 2098: Received interface update msgintdataplane.ifaceStateUpdate{Name:ens160, State:up, Index:2}2025-07-15 07:52:02.349 [INFO][87] felix/int_dataplane.go 2125: Received interface addresses update msgintdataplane.ifaceAddrsUpdate{Name:ens160, Addrs:set.Typed[string]{10.51.0.101:set.v{}}}2025-07-15 07:52:02.349 [INFO ][87] felix/hostip_mgr.go 84: Interface addrs changed. updateintdataplane.ifaceAddrsUpdate{Name:ens160, Addrs:set.Typed[string]{10.51.0.101:set.v{}}}2025-07-16 06:17:42.649 [INFO][87] felix/int_dataplane.go 1475: Linux interface addrs changed. addrsset.Set{10.51.0.101,fe80::20c:29ff:febb:1bdc} ifaceNameens1602025-07-16 06:17:42.649 [INFO][87] felix/int_dataplane.go 2125: Received interface addresses update msgintdataplane.ifaceAddrsUpdate{Name:ens160, Addrs:set.Typed[string]{10.51.0.101:set.v{}, fe80::20c:29ff:febb:1bdc:set.v{}}}2025-07-16 06:17:42.649 [INFO][87] felix/hostip_mgr.go 84: Interface addrs changed. updateintdataplane.ifaceAddrsUpdate{Name:ens160, Addrs:set.Typed[string]{10.51.0.101:set.v{}, fe80::20c:29ff:febb:1bdc:set.v{}}}由于 Calico 官方文档中也有注明first-found 模式为使用第一个接口不包括 Docker 网桥等本地接口上的第一个有效 IP 地址建议根据不同需求选择不同配置方式本文采用的指定接口方式。还需要额外解释下first-found 文档中描述的 the first valid IP address on the first interface (excluding local interfaces such as the docker bridge). 只是举例他的默认排除项中并不包括 nerdctl0所以它会被当作一个合法候选接口。## 注需要看 Calico 部署方式当前环境通过 Calico Tigera Operator 自定义资源部署无法直接更改 DaemonSetrootce-demo-1:~# kubectl edit daemonset calico-node -n calico-system## 找到或添加这个变量- name: IP_AUTODETECTION_METHOD## 由于几台机器网卡名称都是 ens160,此处写网卡名或正则都可以value: interfaceens.*需要更改 Calico 的自定义资源参数官网链接搜索 nodeAddressAutodetectionV4 后有具体参数rootce-demo-1:~# kubectl get InstallationNAME AGEdefault 2d10hrootce-demo-1:~# kubectl edit installation defaultnodeAddressAutodetectionV4:## 默认使用此配置##firstFound: true## 删掉后添加指定接口:interface: ens160## 查看更新结果rootce-demo-1:~# kubectl get pods -n calico-system -o wide | grep calico-nodecalico-node-4fndv 1/1 Running 0 104s 10.51.0.101 ce-demo-2calico-node-8n5qr 1/1 Running 0 39s 10.51.0.100 ce-demo-1calico-node-rxhsf 1/1 Running 0 72s 10.51.0.102 ce-demo-3rootce-demo-1:~# kubectl describe daemonset -n calico-system calico-node | grep IP_AUTODETECTION_METHODIP_AUTODETECTION_METHOD: interfaceens160验证更新后效果rootce-demo-1:~# curl 10.244.228.139:8090/shop/full-test ; echo{count:1,elapsed:209,elapsedAvg:209,startAt:2025-07-17 13:33:55.116,stopAt:2025-07-17 13:33:55.325,success匝屠谠倏
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2477413.html
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!