-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple connections hang at times #146
Comments
While this isn't an exact match for your fact pattern... often when you see packets leaving one machine and not arriving at another, the issue is that some link on the path has a low MTU and fragmentation is not happening. If changing the port number fixes the issue, that could suggest that multiple routing paths are in use and only one path has the low MTU. The part that isn't a good match for your facts is that switching back to the old port doesn't cause the problem to recur. Can you make the problem go away by lowering the MTU of the wireguard interface on B? Does ping of large packets (equal to the size of your current MTU) from B to A or C reliably get responses? |
Unfortunately the problem just starts after a long while of operating fine so it'll be hard for me to test right away. I will try next time it happens. I can tell you though that when it happens pings don't normally work as the handshake doesn't seem to occur. |
I see. When I mentioned pings, I mean pings to the tunnel endpoint (in the "underlay network"), not pings inside the tunnel. If you are diagnosing a potential path MTU issue -- and I don't know that's what this is but I'm suspicious -- you should characterize the path when it's working and then again when it's not, and look for differences. So do a traceroute (outside the tunnel) to show the path in the underlay network which the encrypted packets traverse. Use ping -s to find the largest packet that will pass. Record this info. Then when it's not working, try the traceroute again and the ping -s again, and see whether it's the same path and the same ping size, or not. Good luck. |
I have made some progress in gaining insight to this problem. I still don't fully grasp it though, but it's not an MTU issue. It appears that when conntrack opens a translation on the same ports as used by listening ports on both ends, the problem manifests itself. Concrete example: Machine A listens on 56018 this is some linux distro, set to persist the conx and hit the endpoint Machine-B:56019 If Machine B conntrack opened the following translation, it all works fine: If Machine B conntrack opened the following translation, it hangs every now and then and once it hangs it does not recover: |
Package version
1.0.20220627
Firmware version
2.0.9-hotfix.4
Device
EdgeRouter Lite / PoE - e100
Issue description
I have two connections going to a host. Occasionally my connection to that host will stop working. I see the packets going on eth0 outbound from that host with tcpdump towards my machine, but I never receive them. Normally you'd think the problem is with my machine... but... when I delete the wireguard interfaces and use a new port on the remote machine it starts working again, then I delete interfaces again and reload my old configuration and everything works as it should.
Of interesting to note is the following connections shared by these two machines
My machine = A
Remote Machine = B
Some other machine not mentioned above = C
Wireguard tunnels setup:
A->B
A->C
B->A
B->C
B is the problematic machine with the above mentioned peculiar behavior. I am not sure if the fact they share a tunnel to C plays a role here but that's the only distinguishing feature I have to make this tunnel setup different than other tunnels I have from A->other wireguard tunnels with similar endpoint equipment on the remote side not exhibiting this problem.
Configuration and log output
No response
The text was updated successfully, but these errors were encountered: