Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Failed to create pod sandbox errors for all pods on nodes #250

Open
munjalpatel opened this issue Dec 10, 2022 · 4 comments
Open

Comments

@munjalpatel
Copy link

munjalpatel commented Dec 10, 2022

Bug Description

Pods are failing to start with the following error on certain nodes. I could not find any obvious patterns of why pods on some nodes work fine and fail on others. These symptoms may indicate some racing issues.

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3c0ce05181c3c2383ac85856ac0c0ad5bbab69fd16e123eb0268dc602b28c28b": failed to find network info for sandbox "3c0ce05181c3c2383ac85856ac0c0ad5bbab69fd16e123eb0268dc602b28c28b"

Helm values

namespace: linkerd
mode: linkerd
cniMode: true
resources:
  container:
    limit:
      memory: 200Mi
    request:
      cpu: 100m
      memory: 200Mi

1st failing node details
OS: linux (arm64)
OS Image: Bottlerocket OS 1.11.1 (aws-k8s-1.24)
Kernel version: 5.15.59
Container runtime: containerd://1.6.8+bottlerocket
Kubelet version: v1.24.6-eks-4360b32
AWS EC2 instance type: t4g.small

** Merbridge logs

2022-12-10T17:21:55.083303Z	warn	OS CA Cert could not be found for agent
[ -f bpf/mb_connect.c ] && make -C bpf load || make -C bpf load-from-obj
make[1]: Entering directory '/app/bpf'
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_connect.c -o mb_connect.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_get_sockopts.c -o mb_get_sockopts.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_redir.c -o mb_redir.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_sockops.c -o mb_sockops.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_bind.c -o mb_bind.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_sendmsg.c -o mb_sendmsg.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_recvmsg.c -o mb_recvmsg.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_tc.c -o mb_tc.o
sudo mount -t bpf bpf /sys/fs/bpf
sudo mkdir -p /sys/fs/bpf/tc/globals
[ -f /sys/fs/bpf/cookie_original_dst ] || sudo bpftool map create /sys/fs/bpf/cookie_original_dst type lru_hash key 8 value 24 entries 65535 name cookie_original_dst
[ -f /sys/fs/bpf/local_pod_ips ] || sudo bpftool map create /sys/fs/bpf/local_pod_ips type hash key 16 value 244 entries 1024 name local_pod_ips
[ -f /sys/fs/bpf/process_ip ] || sudo bpftool map create /sys/fs/bpf/process_ip type lru_hash key 4 value 4 entries 1024 name process_ip
[ -f /sys/fs/bpf/cgroup_info_map ] || sudo bpftool map create /sys/fs/bpf/cgroup_info_map type lru_hash key 8 value 32 entries 1024 name cgroup_info_map
[ -f /sys/fs/bpf/mark_pod_ips_map ] || sudo bpftool map create /sys/fs/bpf/mark_pod_ips_map type hash key 4 value 16 entries 65535 name mark_pod_ips_map
sudo bpftool -m prog loadall mb_connect.o /sys/fs/bpf/connect \
	map name cookie_original_dst pinned /sys/fs/bpf/cookie_original_dst \
	map name local_pod_ips pinned /sys/fs/bpf/local_pod_ips \
	map name mark_pod_ips_map pinned /sys/fs/bpf/mark_pod_ips_map \
	map name process_ip pinned /sys/fs/bpf/process_ip \
	map name cgroup_info_map pinned /sys/fs/bpf/cgroup_info_map
[ -f /sys/fs/bpf/pair_original_dst ] || sudo bpftool map create /sys/fs/bpf/pair_original_dst type lru_hash key 36 value 24 entries 65535 name pair_original_dst
[ -f /sys/fs/bpf/sock_pair_map ] || sudo bpftool map create /sys/fs/bpf/sock_pair_map type sockhash key 36 value 4 entries 65535 name sock_pair_map
sudo bpftool -m prog load mb_sockops.o /sys/fs/bpf/sockops \
	map name cookie_original_dst pinned /sys/fs/bpf/cookie_original_dst \
	map name process_ip pinned /sys/fs/bpf/process_ip \
	map name pair_original_dst pinned /sys/fs/bpf/pair_original_dst \
	map name sock_pair_map pinned /sys/fs/bpf/sock_pair_map
sudo bpftool -m prog load mb_get_sockopts.o /sys/fs/bpf/get_sockopts \
	map name pair_original_dst pinned /sys/fs/bpf/pair_original_dst
sudo bpftool -m prog load mb_redir.o /sys/fs/bpf/redir \
	map name sock_pair_map pinned /sys/fs/bpf/sock_pair_map
sudo bpftool -m prog load mb_bind.o /sys/fs/bpf/bind
sudo bpftool -m prog loadall mb_sendmsg.o /sys/fs/bpf/sendmsg \
	map name cookie_original_dst pinned /sys/fs/bpf/cookie_original_dst \
	map name mark_pod_ips_map pinned /sys/fs/bpf/mark_pod_ips_map \
	map name cgroup_info_map pinned /sys/fs/bpf/cgroup_info_map
sudo bpftool -m prog loadall mb_recvmsg.o /sys/fs/bpf/recvmsg \
	map name cookie_original_dst pinned /sys/fs/bpf/cookie_original_dst \
	map name mark_pod_ips_map pinned /sys/fs/bpf/mark_pod_ips_map \
	map name cgroup_info_map pinned /sys/fs/bpf/cgroup_info_map
make[1]: Leaving directory '/app/bpf'
time="2022-12-10T17:21:57Z" level=info msg="Copied /app/merbridge-cni to /host/opt/cni/bin." func="cni-server.copyBinaries()" file="install.go:363"
time="2022-12-10T17:21:57Z" level=info msg="write kubeconfig file /host/etc/cni/net.d/ZZZ-merbridge-cni-kubeconfig with: \n# Kubeconfig file for Merbridge CNI plugin.\napiVersion: v1\nkind: Config\nclusters:\n- name: local\n  cluster:\n    server: https://[10.100.0.1]:443\n    insecure-skip-tls-verify: true\nusers:\n- name: merbridge-cni\n  user:\n    token: \"<redacted>\"\ncontexts:\n- name: merbridge-cni-context\n  context:\n    cluster: local\n    user: merbridge-cni\ncurrent-context: merbridge-cni-context\n" func="cni-server.createKubeconfigFile()" file="install.go:453"
time="2022-12-10T17:21:57Z" level=info msg="CNI config file /host/etc/cni/net.d/01-linkerd-cni.conf exists. Proceeding." func="cni-server.getCNIConfigFilepath()" file="install.go:310"
time="2022-12-10T17:21:57Z" level=info msg="Renaming /host/etc/cni/net.d/01-linkerd-cni.conf extension to .conflist" func="cni-server.writeCNIConfig()" file="install.go:259"
time="2022-12-10T17:21:57Z" level=info msg="Created CNI config /host/etc/cni/net.d/01-linkerd-cni.conflist" func="cni-server.writeCNIConfig()" file="install.go:267"
time="2022-12-10T17:21:57Z" level=info msg="Pod Watcher Ready" func="controller.RunLocalPodController()" file="pod.go:53"
make -C bpf attach
make[1]: Entering directory '/app/bpf'
sudo bpftool cgroup attach /sys/fs/cgroup/unified connect4 pinned /sys/fs/bpf/connect/cgroup_connect4
sudo bpftool cgroup attach /sys/fs/cgroup/unified sock_ops pinned /sys/fs/bpf/sockops
sudo bpftool cgroup attach /sys/fs/cgroup/unified getsockopt pinned /sys/fs/bpf/get_sockopts
sudo bpftool prog attach pinned /sys/fs/bpf/redir msg_verdict pinned /sys/fs/bpf/sock_pair_map
sudo bpftool cgroup attach /sys/fs/cgroup/unified bind4 pinned /sys/fs/bpf/bind
sudo bpftool cgroup attach /sys/fs/cgroup/unified sendmsg4 pinned /sys/fs/bpf/sendmsg/cgroup_sendmsg4
sudo bpftool cgroup attach /sys/fs/cgroup/unified recvmsg4 pinned /sys/fs/bpf/recvmsg/cgroup_recvmsg4
make[1]: Leaving directory '/app/bpf'
time="2022-12-10T17:22:03Z" level=info msg="cni called delete with args: {ContainerID:ce89bae7ebf0b4a62c08aad3db9eb1a913d490b4eea208dd5019ffe4522cdbc2 Netns:/var/run/netns/cni-417e5717-d563-9332-ae6e-1e5c267177f1 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=ebs-csi-node-qpr8p;K8S_POD_INFRA_CONTAINER_ID=ce89bae7ebf0b4a62c08aad3db9eb1a913d490b4eea208dd5019ffe4522cdbc2;K8S_POD_UID=abb80a31-18a5-464a-a4e4-5b754950f121 Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:03.426050Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)
time="2022-12-10T17:22:04Z" level=info msg="cni called delete with args: {ContainerID:4cbd7e03237e1c4152472178f734b43502ffde00e145e28e1d7fc190b6ee1728 Netns:/var/run/netns/cni-e449f8ef-c745-700e-e8f3-e9a0728c49b5 IfName:eth0 Args:K8S_POD_NAMESPACE=lens-metrics;K8S_POD_NAME=node-exporter-pvsxx;K8S_POD_INFRA_CONTAINER_ID=4cbd7e03237e1c4152472178f734b43502ffde00e145e28e1d7fc190b6ee1728;K8S_POD_UID=697d4eac-374f-4c62-9feb-914d2f036acc;IgnoreUnknown=1 Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:04.408446Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)
time="2022-12-10T17:22:09Z" level=info msg="cni called delete with args: {ContainerID:0204449b6017e1efa1ab5e3acf91192a73b99f7cceb60bf53ab47b1fa0df4cfc Netns:/var/run/netns/cni-c316697d-789f-8019-0c2b-1ffec07156da IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=linkerd-viz;K8S_POD_NAME=tap-5f45486f68-mhdm7;K8S_POD_INFRA_CONTAINER_ID=0204449b6017e1efa1ab5e3acf91192a73b99f7cceb60bf53ab47b1fa0df4cfc;K8S_POD_UID=daa34ff7-4616-456b-a405-505fb4f09d02 Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:09.431593Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)

2nd failing node details
OS: linux (arm64)
OS Image: Bottlerocket OS 1.11.1 (aws-k8s-1.24)
Kernel version: 5.15.59
Container runtime: containerd://1.6.8+bottlerocket
Kubelet version: v1.24.6-eks-4360b32
AWS EC2 instance type: m6gd.medium

** Merbridge logs

2022-12-10T17:22:13.221412Z	warn	OS CA Cert could not be found for agent
[ -f bpf/mb_connect.c ] && make -C bpf load || make -C bpf load-from-obj
make[1]: Entering directory '/app/bpf'
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_connect.c -o mb_connect.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_get_sockopts.c -o mb_get_sockopts.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_redir.c -o mb_redir.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_sockops.c -o mb_sockops.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_bind.c -o mb_bind.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_sendmsg.c -o mb_sendmsg.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_recvmsg.c -o mb_recvmsg.o
clang -O2 -g  -Wall -target bpf -I/usr/include/aarch64-linux-gnu  -DMESH=2 -DENABLE_IPV4=1 -DENABLE_IPV6=0 -c mb_tc.c -o mb_tc.o
sudo mount -t bpf bpf /sys/fs/bpf
sudo mkdir -p /sys/fs/bpf/tc/globals
[ -f /sys/fs/bpf/cookie_original_dst ] || sudo bpftool map create /sys/fs/bpf/cookie_original_dst type lru_hash key 8 value 24 entries 65535 name cookie_original_dst
[ -f /sys/fs/bpf/local_pod_ips ] || sudo bpftool map create /sys/fs/bpf/local_pod_ips type hash key 16 value 244 entries 1024 name local_pod_ips
[ -f /sys/fs/bpf/process_ip ] || sudo bpftool map create /sys/fs/bpf/process_ip type lru_hash key 4 value 4 entries 1024 name process_ip
[ -f /sys/fs/bpf/cgroup_info_map ] || sudo bpftool map create /sys/fs/bpf/cgroup_info_map type lru_hash key 8 value 32 entries 1024 name cgroup_info_map
[ -f /sys/fs/bpf/mark_pod_ips_map ] || sudo bpftool map create /sys/fs/bpf/mark_pod_ips_map type hash key 4 value 16 entries 65535 name mark_pod_ips_map
sudo bpftool -m prog loadall mb_connect.o /sys/fs/bpf/connect \
	map name cookie_original_dst pinned /sys/fs/bpf/cookie_original_dst \
	map name local_pod_ips pinned /sys/fs/bpf/local_pod_ips \
	map name mark_pod_ips_map pinned /sys/fs/bpf/mark_pod_ips_map \
	map name process_ip pinned /sys/fs/bpf/process_ip \
	map name cgroup_info_map pinned /sys/fs/bpf/cgroup_info_map
[ -f /sys/fs/bpf/pair_original_dst ] || sudo bpftool map create /sys/fs/bpf/pair_original_dst type lru_hash key 36 value 24 entries 65535 name pair_original_dst
[ -f /sys/fs/bpf/sock_pair_map ] || sudo bpftool map create /sys/fs/bpf/sock_pair_map type sockhash key 36 value 4 entries 65535 name sock_pair_map
sudo bpftool -m prog load mb_sockops.o /sys/fs/bpf/sockops \
	map name cookie_original_dst pinned /sys/fs/bpf/cookie_original_dst \
	map name process_ip pinned /sys/fs/bpf/process_ip \
	map name pair_original_dst pinned /sys/fs/bpf/pair_original_dst \
	map name sock_pair_map pinned /sys/fs/bpf/sock_pair_map
sudo bpftool -m prog load mb_get_sockopts.o /sys/fs/bpf/get_sockopts \
	map name pair_original_dst pinned /sys/fs/bpf/pair_original_dst
sudo bpftool -m prog load mb_redir.o /sys/fs/bpf/redir \
	map name sock_pair_map pinned /sys/fs/bpf/sock_pair_map
sudo bpftool -m prog load mb_bind.o /sys/fs/bpf/bind
sudo bpftool -m prog loadall mb_sendmsg.o /sys/fs/bpf/sendmsg \
	map name cookie_original_dst pinned /sys/fs/bpf/cookie_original_dst \
	map name mark_pod_ips_map pinned /sys/fs/bpf/mark_pod_ips_map \
	map name cgroup_info_map pinned /sys/fs/bpf/cgroup_info_map
sudo bpftool -m prog loadall mb_recvmsg.o /sys/fs/bpf/recvmsg \
	map name cookie_original_dst pinned /sys/fs/bpf/cookie_original_dst \
	map name mark_pod_ips_map pinned /sys/fs/bpf/mark_pod_ips_map \
	map name cgroup_info_map pinned /sys/fs/bpf/cgroup_info_map
make[1]: Leaving directory '/app/bpf'
time="2022-12-10T17:22:15Z" level=info msg="Copied /app/merbridge-cni to /host/opt/cni/bin." func="cni-server.copyBinaries()" file="install.go:363"
time="2022-12-10T17:22:15Z" level=info msg="write kubeconfig file /host/etc/cni/net.d/ZZZ-merbridge-cni-kubeconfig with: \n# Kubeconfig file for Merbridge CNI plugin.\napiVersion: v1\nkind: Config\nclusters:\n- name: local\n  cluster:\n    server: https://[10.100.0.1]:443\n    insecure-skip-tls-verify: true\nusers:\n- name: merbridge-cni\n  user:\n    token: \"<redacted>\"\ncontexts:\n- name: merbridge-cni-context\n  context:\n    cluster: local\n    user: merbridge-cni\ncurrent-context: merbridge-cni-context\n" func="cni-server.createKubeconfigFile()" file="install.go:453"
time="2022-12-10T17:22:15Z" level=info msg="CNI config file /host/etc/cni/net.d/01-linkerd-cni.conf exists. Proceeding." func="cni-server.getCNIConfigFilepath()" file="install.go:310"
time="2022-12-10T17:22:15Z" level=info msg="Renaming /host/etc/cni/net.d/01-linkerd-cni.conf extension to .conflist" func="cni-server.writeCNIConfig()" file="install.go:259"
time="2022-12-10T17:22:15Z" level=info msg="Created CNI config /host/etc/cni/net.d/01-linkerd-cni.conflist" func="cni-server.writeCNIConfig()" file="install.go:267"
time="2022-12-10T17:22:15Z" level=info msg="Pod Watcher Ready" func="controller.RunLocalPodController()" file="pod.go:53"
make -C bpf attach
make[1]: Entering directory '/app/bpf'
sudo bpftool cgroup attach /sys/fs/cgroup/unified connect4 pinned /sys/fs/bpf/connect/cgroup_connect4
sudo bpftool cgroup attach /sys/fs/cgroup/unified sock_ops pinned /sys/fs/bpf/sockops
sudo bpftool cgroup attach /sys/fs/cgroup/unified getsockopt pinned /sys/fs/bpf/get_sockopts
sudo bpftool prog attach pinned /sys/fs/bpf/redir msg_verdict pinned /sys/fs/bpf/sock_pair_map
sudo bpftool cgroup attach /sys/fs/cgroup/unified bind4 pinned /sys/fs/bpf/bind
sudo bpftool cgroup attach /sys/fs/cgroup/unified sendmsg4 pinned /sys/fs/bpf/sendmsg/cgroup_sendmsg4
sudo bpftool cgroup attach /sys/fs/cgroup/unified recvmsg4 pinned /sys/fs/bpf/recvmsg/cgroup_recvmsg4
make[1]: Leaving directory '/app/bpf'
time="2022-12-10T17:22:16Z" level=info msg="cni called delete with args: {ContainerID:47ef94afa54e33404bf90eef472a8c08dd5c9c99edca255ce75802d422d34122 Netns:/var/run/netns/cni-acf182b9-6e9d-9d40-37ad-dfab01b134f8 IfName:eth0 Args:K8S_POD_INFRA_CONTAINER_ID=47ef94afa54e33404bf90eef472a8c08dd5c9c99edca255ce75802d422d34122;K8S_POD_UID=45ae086f-34bf-4fad-82e6-c6f21d93b302;IgnoreUnknown=1;K8S_POD_NAMESPACE=lens-metrics;K8S_POD_NAME=node-exporter-b2qn7 Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:16.196130Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)
time="2022-12-10T17:22:16Z" level=info msg="cni called delete with args: {ContainerID:312ded126900f5a479b72d5a27e5940cf8f47ddd249def0c508b740480c68d2d Netns:/var/run/netns/cni-124c5911-ae72-6fc8-e21b-0a94cb7aa830 IfName:eth0 Args:K8S_POD_INFRA_CONTAINER_ID=312ded126900f5a479b72d5a27e5940cf8f47ddd249def0c508b740480c68d2d;K8S_POD_UID=267ff840-b883-40e9-bdfc-0bfb8d732f9c;IgnoreUnknown=1;K8S_POD_NAMESPACE=linkerd-viz;K8S_POD_NAME=tap-5f45486f68-dvcww Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:16.278767Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)
time="2022-12-10T17:22:18Z" level=info msg="cni called delete with args: {ContainerID:0c101962bb6b3609cd5e2c6b3a2738593e02cac04bb690a9a4a20d80f332b532 Netns:/var/run/netns/cni-84b74e60-7ba3-8897-9460-4fcea9976adb IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=ebs-csi-node-rc66g;K8S_POD_INFRA_CONTAINER_ID=0c101962bb6b3609cd5e2c6b3a2738593e02cac04bb690a9a4a20d80f332b532;K8S_POD_UID=89ac567a-cd56-4fda-8db4-e0c2bb71343d Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:18.179878Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)
time="2022-12-10T17:22:18Z" level=info msg="cni called delete with args: {ContainerID:b045e0a7ac6c237065baf41c2179fb5e6bc4579297d31d388c8dc58b1a8864d2 Netns:/var/run/netns/cni-5e7e2cb4-1377-915e-d391-c65c9a454888 IfName:eth0 Args:K8S_POD_NAMESPACE=linkerd-viz;K8S_POD_NAME=prometheus-74dd4ffb74-c7chh;K8S_POD_INFRA_CONTAINER_ID=b045e0a7ac6c237065baf41c2179fb5e6bc4579297d31d388c8dc58b1a8864d2;K8S_POD_UID=509cb2fb-0ffb-495b-8224-9f2140ff8a04;IgnoreUnknown=1 Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:18.285592Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)
time="2022-12-10T17:22:18Z" level=info msg="Restarting Merbridge CNI installer..." func="cni-server.(*Installer).Run()" file="install.go:113"
time="2022-12-10T17:22:19Z" level=info msg="Copied /app/merbridge-cni to /host/opt/cni/bin." func="cni-server.copyBinaries()" file="install.go:363"
time="2022-12-10T17:22:19Z" level=info msg="write kubeconfig file /host/etc/cni/net.d/ZZZ-merbridge-cni-kubeconfig with: \n# Kubeconfig file for Merbridge CNI plugin.\napiVersion: v1\nkind: Config\nclusters:\n- name: local\n  cluster:\n    server: https://[10.100.0.1]:443\n    insecure-skip-tls-verify: true\nusers:\n- name: merbridge-cni\n  user:\n    token: \"<redacted>\"\ncontexts:\n- name: merbridge-cni-context\n  context:\n    cluster: local\n    user: merbridge-cni\ncurrent-context: merbridge-cni-context\n" func="cni-server.createKubeconfigFile()" file="install.go:453"
time="2022-12-10T17:22:19Z" level=info msg="CNI config file /host/etc/cni/net.d/01-linkerd-cni.conflist exists. Proceeding." func="cni-server.getCNIConfigFilepath()" file="install.go:310"
time="2022-12-10T17:22:19Z" level=info msg="Created CNI config /host/etc/cni/net.d/01-linkerd-cni.conflist" func="cni-server.writeCNIConfig()" file="install.go:267"
time="2022-12-10T17:22:29Z" level=info msg="cni called delete with args: {ContainerID:7c211e35501935b10edbc48477aef2fc3cba653c4d112d08032a9b34a607decf Netns:/var/run/netns/cni-68cf0c92-feb9-625c-5d8b-e6dd047542c2 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=linkerd-viz;K8S_POD_NAME=tap-5f45486f68-dvcww;K8S_POD_INFRA_CONTAINER_ID=7c211e35501935b10edbc48477aef2fc3cba653c4d112d08032a9b34a607decf;K8S_POD_UID=267ff840-b883-40e9-bdfc-0bfb8d732f9c Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:29.548978Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)
time="2022-12-10T17:22:30Z" level=info msg="cni called delete with args: {ContainerID:b082ca4a684f5381ea0a7f510b7a5ef3e401bd26cec3185d37e62f04e3ca674f Netns:/var/run/netns/cni-e32ec8c1-ec2d-491d-fa87-387ed183593c IfName:eth0 Args:K8S_POD_UID=89ac567a-cd56-4fda-8db4-e0c2bb71343d;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=ebs-csi-node-rc66g;K8S_POD_INFRA_CONTAINER_ID=b082ca4a684f5381ea0a7f510b7a5ef3e401bd26cec3185d37e62f04e3ca674f Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:30.203695Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)
time="2022-12-10T17:22:30Z" level=info msg="cni called delete with args: {ContainerID:54527144c307e5f3d3038396d90519b21adb42c5b2662c38dd94098a4cba0b45 Netns:/var/run/netns/cni-7787beec-0b39-027c-b787-3b6a6ea624d9 IfName:eth0 Args:K8S_POD_NAME=prometheus-74dd4ffb74-c7chh;K8S_POD_INFRA_CONTAINER_ID=54527144c307e5f3d3038396d90519b21adb42c5b2662c38dd94098a4cba0b45;K8S_POD_UID=509cb2fb-0ffb-495b-8224-9f2140ff8a04;IgnoreUnknown=1;K8S_POD_NAMESPACE=linkerd-viz Path:/opt/cni/bin StdinData:[123 34 97 114 103 115 34 58 123 34 115 101 114 118 105 99 101 77 101 115 104 77 111 100 101 34 58 34 108 105 110 107 101 114 100 34 125 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 107 117 98 101 114 110 101 116 101 115 34 58 123 34 107 117 98 101 99 111 110 102 105 103 34 58 34 47 101 116 99 47 99 110 105 47 110 101 116 46 100 47 90 90 90 45 109 101 114 98 114 105 100 103 101 45 99 110 105 45 107 117 98 101 99 111 110 102 105 103 34 125 44 34 110 97 109 101 34 58 34 107 56 115 45 112 111 100 45 110 101 116 119 111 114 107 34 44 34 116 121 112 101 34 58 34 109 101 114 98 114 105 100 103 101 45 99 110 105 34 125]}" func="cni-server.(*server).PodDeleted()" file="handlers.go:58"
2022-12-10T17:22:30.285416Z	info	http: superfluous response.WriteHeader call from github.com/merbridge/merbridge/internal/cni-server.(*server).PodDeleted (handlers.go:68)

image

Version

OS: `linux (arm64)`
OS Image: `Bottlerocket OS 1.11.1 (aws-k8s-1.24)`
Kernel version: `5.15.59`

Probably related to #218

@kebe7jun
Copy link
Member

I can't see much useful information from the logs, but I suspect it might be due to a conflict between the CNI plugin mode and the AWS CNI plugin. Maybe I need to investigate further in a real environment.

@munjalpatel
Copy link
Author

@kebe7jun given the randomness, some kind of racing with AWS VPC CNI is possible.
A startup taint like Cilium may be an option if that is indeed an issue.

@kebe7jun
Copy link
Member

Ok, we will try to add this option.

@sasokolov
Copy link

sasokolov commented Oct 2, 2023

We have reproduced a similar problem. Immediately after launching daemonset merbridge, everything works fine, but after a while new pods cease to have access to the network, and those of them that have init containers or other interaction with the network at startup fall into the status of ClashLoopBackoff. Restarting the merbridge pod on a problematic node helps for a while.
At the same time, there is nothing in the logs to pay attention to.

Environment:
orcestrator : EKS 1.24
AMI: ubuntu EKS (kernel 5.15.0-1045-aws)
CNI: calico v3.26.1
mesh: istio 1.18.3
container-runtime: containerd://1.7.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants