Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer branch of wireguard configuration deleted at reboot if "allowed-ips = true" and interface is disabled #103

Open
srwsolutions opened this issue Sep 20, 2021 · 3 comments
Labels
bug/possible A possible bug that has not yet been confirmed

Comments

@srwsolutions
Copy link

Package version

1.0.20210424

Firmware version

2.0.9 hotfix 2

Device

EdgeRouter X (SFP) - e50

Issue description

If you set allowed-ips to true, and subsequently disable the interface, if the device is reboots in that state, the peer branch of the wireguard configuration is deleted during the subsequent startup process, and you must re-enter all the deleted information. This may also related to the cause of bug #99 (Config fails at reboot). Here is a copy of a post I put on the ubnt website that goes into the details of the problem:

============================================================================

OK, after playing around with this for several hours I found out why it's happening. The reason the "peer" branch is being deleted when the interface is disabled is because "route-allowed-ips" is set to "true". This setting automatically adds routes for all of the subnets specified in the "allowed-ips" settings when the interface comes up. I had a list of subnets in my peer configuration and I had "route-allowed-ips" set to "true". When I changed it to "false" the "peer" branch was not deleted when the router was rebooted while the interface was disabled.

I think what's happening at startup is that it's attempting to set "route-allowed-ips" to true while the interface is disabled and that fails, as you get an error if you attempt to change that setting from "false" to "true" via the cli while the interface is disabled. However, you can change it from "true" to "false" from the cli while the interface is disabled. There must be some failsafe logic in the startup process to delete any branch from the config tree whose settings cause an error to occur while the commands are being executed, assuming that either corruption has occurred or something has gotten past the cli validity checking when it was put in, which is the case here. The problem is that the cli allows the interface to be placed in disabled status at runtime while "route-allowed-ips" is set to "true", but does not allow "route-allowed-ips" to actually be set to that value while the interface is already in disabled status, thus causing the startup to fail when it attempts to do so as it processes the config file. This triggers the failsafe action of deleting the "peer" branch.

The fix would have to be to either disallow the interface to be disabled while "route-allowed-ips" has the value of "true", or to allow "route-allowed-ips" to be set to "true" via the cli while the interface is in the disabled status. There is already some logic to prevent "route-allowed-ips" from being set to "true" if the only entry in "allowed-ips" is "0.0.0.0/0", as I encountered that during my testing, so care would have to be taken to ensure that check remained even if the interface was in disabled status.

In the short term simply not ever setting "route-allowed-ips" to "true" and applying the routes manually via the cli would ensure that you don't inadvertently lose part of the configuration if you disable the interface and you reboot or the power fails while it's in that state.

Configuration and log output

No response

@srwsolutions srwsolutions added the bug/possible A possible bug that has not yet been confirmed label Sep 20, 2021
@srwsolutions srwsolutions changed the title Peer branch of wireguard configuration deleted at reboot if allowed-ips = true and interface is disabled Peer branch of wireguard configuration deleted at reboot if "allowed-ips = true" and interface is disabled Sep 20, 2021
@dulitz
Copy link

dulitz commented Sep 20, 2021

Thanks for the excellent bug report.

If EdgeOS were a community project, I'm pretty sure the right solution would be to allow route-allowed-ips to be true when the interface is disabled, but to postpone any routing configuration until the interface is enabled again. My initial impression is that that's too heavy a lift for this kind of post-release patch.

I don't know the consequences of disallowing the interface to be disabled under some circumstances but it sounds scary. Is there any other condition when an interface is not allowed to be disabled?

@srwsolutions
Copy link
Author

srwsolutions commented Sep 20, 2021

Yes, I agree, as I don't know what problem would occur if "route-allowed-ips" is set to true when the interface is disabled. In fact, I had no issues with having things in that state when I disabled the WG interface from the gui. In my test I was using a configuration that had OpenVPN in it as well, with the same routes, and I was testing to see if there was any perceptible performance differences or issues when using Wireguard versus OpenVPN via UDP, and of course I could only have one of the interfaces enabled at a time. I switched back and forth several times while the router was up without any issues or the routes getting messed up. It was only after I rebooted with the WG interface was disabled that I had issues. At first I thought I was a klutz and forgot to save the configuration, but after it happened several times I finally figured it out.

I guess I'm a bit confused as to where what you folks are doing ends and what the ubnt developers do, as in who put the the wireguard support into EdgeOS, created the fields that go into the config tree, and what validation those data items undergo. Based on bug report #99 it looks like this issue is a bit more complicated than what I reported, but it smells like it's involved with the same issues, when do routes become available on the network versus when you allow processes to manipulate those routes, and what happens if a route is unavailable at that time. Whatever the right answer is I don't think deleting a whole branch of the config tree at startup is it though.

@dulitz
Copy link

dulitz commented Sep 21, 2021

I don't know either but I suspect that one of two things would happen if, at boot, we let route-allowed-ips be true when the interface is disabled:

  • those IPs would be blackholed, routed to a down interface, even if they were reachable by the default route, or
  • the routes would fail to install, so when the interface comes up the allowed IPs would not be routed.

I agree the current behavior is bad. Not sure what to replace it with. I will put it on my list to investigate but am not confident I'll get to it.

I'm just a random guy on the Internet. The folks doing this are the WG developers. They put the WG support into EdgeOS, created the fields, and set up the validation scripts. The ubnt guys forked VyOS years ago and then released EdgeOS with the Vyatta validation framework. But EdgeOS hasn't had any significant releases in a long time. :(

"The right answer" to the question of what to do when the interface is down, I'm pretty sure, is to install the routes when the interface comes up. That might be possible to do in this patch, not sure. But some things like that require deeper changes across the Vyatta scripts, which are too fragile (= effortful to maintain) to persist in a patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug/possible A possible bug that has not yet been confirmed
Development

No branches or pull requests

2 participants