-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No performance improvement with Merbridge measurable #291
Comments
What tool did you use to test it? |
@kebe7jun Thanks for your reply! I use k6 as load test tool and httpbin as the serverside counter part. I also tried it with the CNI mode, wich did not show any improvements... The averages of the three tests i did were 26.87ms, avg=35.81ms, and avg=16.85ms. |
Can you provide more information? For example, your k6 configuration information, test case information, deployment architecture, etc. Help us to review and troubleshoot. Theoretically this must be the wrong result and I need to figure out why ...... It would be best if the information you provide can help me reproduce the environment. |
@kebe7jun Okay, I hope this comment will help you reproducing everything you need: GeneralThe aim of my loadtests is to verify weather eBPF can improve service mesh networking when using Istio or not. Therefore I test east-west traffic between two pods. In order to "flood" the iptables rules, and slow down the general Istio deployment, I apply pods and services. Here is a chart which might help: Cluster Setup and InstallationThe cluster consists of 3 VMs with 4vCPUs, 10GB Disk and 8GB RAM each. As OS they all run Ubuntu 22.04.2 LTS. The nodes I use for the following:
On each VM I install:
In order to init kubernetes I execute the following commands: echo "overlay" >> /etc/modules-load.d/k8s.conf
echo "br_netfilter" >> /etc/modules-load.d/k8s.conf
modprobe br_netfilter
modprobe overlay
echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.d/k8s.conf
echo "net.bridge.bridge-nf-call-ip6tables = 1" >> /etc/sysctl.d/k8s.conf
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/k8s.conf
sudo sysctl --system The kubernetes cluster will be initialized with the help of ´kubeadm´ which receives the following configuration: apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
controlPlaneEndpoint: "mydomain.example.com"
networking:
podSubnet: 10.244.0.0/16
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 1000 When these installation steps on the VMs are completed I join the worker nodes and install callico: curl https://raw.githubusercontent.com/projectcalico/calico/v3.24.5/manifests/calico.yaml | kubectl apply -f -
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=kubernetes-internal-ip In the next step I install Istio: helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update
kubectl create namespace istio-system
helm install istio-base istio/base -n istio-system --version 1.16.2
helm install istiod istio/istiod -n istio-system --wait --version 1.16.2 -f ./istiod_helm_values.yaml Here is the configuration file I use: global:
proxy:
resources: null Afterwards the Istio mTLS mode is set to strict by applying this configuration: apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: "mtls-mode-strict"
namespace: "istio-system"
spec:
mtls:
mode: STRICT Then I install Merbridge (In the tests without Merbridge, this step is, of course, skipped. ;) ): kubectl apply -f https://raw.githubusercontent.com/merbridge/merbridge/main/deploy/all-in-one.yaml Afterwards the monitoring stack is depyed, but I don't think that this has to do anything with the described problem. Therefore I will skip them. In the end I apply an ingress and configure it accordingly: kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.5.1/deploy/static/provider/baremetal/deploy.yaml
MASTER_INTERNAL_IP=$(kubectl get node main-node -o jsonpath="{.status.addresses[?(@.type=='InternalIP')].address}")
kubectl -n ingress-nginx patch svc ingress-nginx-controller -p $(echo \"$MASTER_INTERNAL_IP\" | jq -c -n '.spec.externalIPs |= [inputs]') Load Test SetupCluster flooderAs mentioned before, I try to fill the iptable rules. This is acomplished by a simple Helm setup. Dependent on the configuration, X pods and Y services will be created, which will run on the worder1 node. The services all point to all pods of this cluster flooder component. In my tests I created 250 pods and 250 services. I uploaded the helm files here: https://github.com/MerzMax/cluster-flooder Server (httpbin)This is the configuration and setup I use for the httpbin: apiVersion: v1
kind: Namespace
metadata:
name: server
labels:
istio-injection: enabled
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpbin
namespace: server
spec:
replicas: 1
selector:
matchLabels:
app: httpbin
version: v1
template:
metadata:
labels:
app: httpbin
version: v1
spec:
containers:
- image: docker.io/kennethreitz/httpbin
name: httpbin
ports:
- containerPort: 80
nodeSelector:
server: "true"
---
apiVersion: v1
kind: Service
metadata:
namespace: server
name: httpbin
labels:
app: httpbin
service: httpbin
spec:
ports:
- name: http
port: 8000
targetPort: 80
selector:
app: httpbin In order to have some Istio configuration for the servce, I apply the following rules: apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-nothing
namespace: server
spec:
{}
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: httpbin-allow-get
namespace: server
spec:
action: ALLOW
selector:
matchLabels:
app: httpbin
rules:
- from:
- source:
namespaces: ["client"]
to:
- operation:
methods: ["GET"]
paths: ["/get"]
ports: ["80"]
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: httpbin
namespace: server
spec:
host: httpbin
subsets:
- name: v1
labels:
version: v1
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: httpbin
namespace: server
spec:
hosts:
- httpbin
http:
- route:
- destination:
host: httpbin
subset: v1
timeout: 2s Client (k6)K6 is deployed the following way: apiVersion: v1
kind: Namespace
metadata:
labels:
istio-injection: enabled
name: client
---
apiVersion: v1
kind: ConfigMap
metadata:
name: k6-script
namespace: client
data:
script.js: |-
import http from 'k6/http';
export const options = {
scenarios: {
szenario_load_test: {
executor: "ramping-arrival-rate",
// It should preallocate 2 VUs before starting the test.
preAllocatedVUs: 3,
// It is allowed to spin up to `maxVUs` VUs in order to sustain the constant arrival rate
maxVUs: 100,
// It should start `startRate` iterations per `timeUnit`
timeUnit: "1s",
stages: [
// Number of `target` iterations per `timeUnit` for `duration`
{ target: 200, duration: '5m' },
],
},
},
};
export default function () {
http.get('http://httpbin.server:8000/get');
}
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-nfs-k6
namespace: client
labels:
type: nfs
spec:
storageClassName: nfs
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
nfs:
server: mypvdomain.example.com
path: "/nfs/k6"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: k6-results-persistent-volume-claim
namespace: client
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
storageClassName: nfs
---
apiVersion: batch/v1
kind: Job
metadata:
name: k6
namespace: client
spec:
template:
spec:
containers:
- name: k6
image: grafana/k6:0.43.1
env:
- name: K6_PROMETHEUS_RW_SERVER_URL
value: "http://prometheus-kube-prometheus-prometheus.monitoring:9090/api/v1/write"
command: [ "/bin/sh", "-c", "--" ]
args: [ "sleep 20 && k6 run /scripts/script.js -o csv=/k6-results/$(date +%Y-%m-%d_%H-%M-%S)_test_results.csv -o experimental-prometheus-rw --summary-export /k6-results/$(date +%Y-%m-%d_%H-%M-%S)_test_results_summary.json | tee /k6-results/$(date +%Y-%m-%d_%H-%M-%S)_test_results_summary.txt && wget --post-data '' http://127.0.0.1:15020/quitquitquit" ]
volumeMounts:
- name: k6-script
mountPath: /scripts/script.js
subPath: script.js
- name: k6-results
mountPath: /k6-results
restartPolicy: Never
volumes:
- name: k6-script
configMap:
name: k6-script
items:
- key: script.js
path: script.js
- name: k6-results
persistentVolumeClaim:
claimName: k6-results-persistent-volume-claim
nodeSelector:
client: "true" |
After my testing, it is true that the max duration does increase in CNI mode, which may require further analysis to address. However, overall latency has been optimized to some extent. I also tested the ambient mode, which further optimizes latency in this mode. Of course, the optimization effect of Merbridge for the same node is more pronounced at present.
ambient mode:
Note that, ambient profile enable debug mode by default, you should manually disable debug mode by modifing merbridge DaemonSet. |
Hey @kebe7jun, However, two questions have arisen for me:
|
The ambient mode is the addition of support for Istio ambient mode, which I have not yet merged into the main branch because it is an experimental feature. |
CNI mode optimizes connection performance. |
I used the CNI mode in the past as well but the results did not show any improvement, compared to the "standard" Merbridge mode. Test1:
Test2:
Test3:
I will check, wheather I will see improvements with the experimental modeor not.. |
@kebe7jun I tested the ambient implementation and these are my results: Test1:
Test2:
Test 2:
From my point of view this does not look like an improvement either :/ |
what does Test* stand for? |
@kebe7jun I just executed k6 3 times to get several test results. |
It looks like there is a large error in your results. I am not sure about the reliability of this test, as I conducted it in an environment with limited application. Perhaps other running applications could affect the performance? Alternatively, could you provide a comparison between the results with and without Merbridge in an ambient mode? Could you please run the test three times? |
It could be that other applications affect it, but I have no idea which ones. On the other side, the results measured with "plain" calico and without Merbridge display high latency as well. If I understand eBPF and Merbridge correctly, the latency and CPU usage should be decrease when using Merbridge. As a result the request duration should be lower when using eBPF, which my tests with Merbridge can not verify. This is the case for both the standard and the ambient configuration. The test results from the standard Merbridge configuration can be found in the second chart of the initial issue. The results from using merbridge not in the am |
Yes, Merbridge does optimize network latency, but this effect is likely to be around 5-12% in a typical test scenario, which is likely to be submerged in error because the percentage difference is not significant. |
@kebe7jun You are right that the underlying infrastructure is not the best for load tests. Additionally I am surprised, that your setup's latency is less then 20% of my measured latency. Did you deploy the "cluster-flooder" to your setup and how did you setup your calico / other CNI beneath Merbridge? |
No cluster-flooder installed... |
@kebe7jun This should make a hudge difference.. |
My point is that the improvement in Merbridge is not significant(only ~10%), and when we really need to test performance metrics, there is no need to introduce other variables that will increase the error, which will prevent us from achieving the desired results of performance testing. Therefore, I did not deploy the workload. |
refer to #172 |
@MerzMax you can try https://github.com/istio/tools/tree/master/perf/benchmark to test merbrdige. |
@kebe7jun |
Hi,
i am currently evaluating Istio together with Merbridge. Apparently I don’t see any improvements when using Merbridge compared to using plain Calico…
Except to the quite loud cloud noise, Merbridge seems to perform worse than Calico. If I understand it correctly, Merbridge should be minimum as fast as Calico and with higher loads even more performant.
Can you explain this behavior?
Here are some Versions, that might help explaining the problem:
Istio: 1.16.2 (Installed with helm)
Calico: 3.24.5 (Installed with the Manifest yaml)
Merbridge: installed with
kubectl apply -f https://raw.githubusercontent.com/merbridge/merbridge/main/deploy/all-in-one.yaml
The text was updated successfully, but these errors were encountered: