-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Route leak between a VRF and default VRF points to dev lo and packets are time exceeded in-transit #15909
Comments
After research a little bit I found the issue.
So, on a normal Linux system you will have:
The lookup for local is the first one. Before Kernel 6.6 or even earlier (I have no clue) we should theoretical do this change. I still found this VRF preparation in Linux Kernels:
and you can check the sources of the Kernel and still find a lot of time calls of these functions on tests. But seems that from a specific version of Kernel this problem was fixed and if you move the lookup for local after VRFs you will end in this situation. Does anybody have any idea why we should or not use this trick with Kernel rules? |
The reason you'd want the local rule after the l3mdev rule is so that routed packets ingressing a VRF slave won't be terminated if they target an address assigned only to the default VRF. If you do I'm not sure I understand why a loop is occurring just from this output, but hopefully that gives a little context around the ip rule behavior. Maybe it would be helpful to see the output of |
Hi @taspelund , Thanks for the explanation. Makes sens what you explain, indeed. Now I have a better idea why we need to move rule 0 after |
I created a testlab: R01:
FRRouting config:
Output on R01:
So everything looks fine as route leak. R01 nexthop tables:
R01: routing tables:
C01:
C02:
But I can't ping the endpoints:
C02 is receiving from R01:
This is extreamly weird this behavion when I'm trying to route-leak between GRT and VRFs. If I'm moving rule 0 after l3mdev I will not be able to ping the R01 loopback addresses from GRT. Is going in the same issue. |
What I remember is that previous versions of FRRouting was inserting the routes leaks using the interface source, like:
instead
VRF red:
instead:
Like FRR shows in the In the past, when VRF feature was introduced in Linux, I remember that I tried to route via VRF interface and I ended in similar situations. I have no idea if is just a |
Yes, as I suspected. Downgrading to version 9.0.2 stable the ping is working and I have this:
All routes points to the interface via VRF and not to the VRF interface itself.
|
I've tested also with 10.0 stable and I have the same issue. |
I've test 9.1 and is still working ok. I'm presuming that something was changed between 9.1 and 10.0 and made a regression. |
I suspect the issue is with this route using
I know using I think we need to understand what the expected kernel behavior is here (should routing to |
I'll have a look at this issue |
Exactly the same I was thinking. I think I was trying to build a multi-loopback interface driver for Linux and I faced some similar issues. Here is my project I'm trying to achieve multi loopback interfaces: https://github.com/EasyNetDev/linux-multi-loopback |
Try this patch (for the latest master)
|
Ok. I will try this evening.
When I was zapping through the PR in 10.0 |
Good catch. This is the code I am patching |
Yep, is working:
|
@taspelund you will create a PR ? |
You go ahead. I didn't do any of the coding, just have my 2 cents on what I thought was wrong :-) |
Ah, sorry @taspelund. I was tagging you wrongly. I wanted to ask @louis-6wind . |
I will do the pull-request. Let me check a few things first |
You are correct. If you add a route on the local table, you can solve the routing loop issue. |
Leaked route from the l3VRF are installed with the loopback as the nexthop interface instead of the real interface. > B>* 10.0.0.0/30 [20/0] is directly connected, lo (vrf default), weight 1, 00:21:01 Routing of packet from a L3VRF to the default L3VRF destined to a leak prefix fails because of the default routing rules on Linux. > 0: from all lookup local > 1000: from all lookup [l3mdev-table] > 32766: from all lookup main > 32767: from all lookup default When the packet is received in the loopback interface, the local rules are checked without match, then the l3mdev-table says to route to the loopback. A routing loop occurs (TTL is decreasing). > 12:26:27.928748 ens37 In IP (tos 0x0, ttl 64, id 26402, offset 0, flags [DF], proto ICMP (1), length 84) > 10.0.0.2 > 10.0.1.2: ICMP echo request, id 47463, seq 1, length 64 > 12:26:27.928784 red Out IP (tos 0x0, ttl 63, id 26402, offset 0, flags [DF], proto ICMP (1), length 84) > 10.0.0.2 > 10.0.1.2: ICMP echo request, id 47463, seq 1, length 64 > 12:26:27.928797 ens38 Out IP (tos 0x0, ttl 63, id 26402, offset 0, flags [DF], proto ICMP (1), length 84) > 10.0.0.2 > 10.0.1.2: ICMP echo request, id 47463, seq 1, length 64 Do not set the lo interface as a nexthop interface. Keep the real interface where possible. Fixes: db7cf73 ("bgpd: fix interface on leaks from redistribute connected") Fixes: 067fbab ("bgpd: fix interface on leaks from network statement") Fixes: 8a02d9f ("bgpd: Set nh ifindex to VRF's interface, not the real") Fixes: FRRouting#15909 Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Leaked route from the l3VRF are installed with the loopback as the nexthop interface instead of the real interface. > B>* 10.0.0.0/30 [20/0] is directly connected, lo (vrf default), weight 1, 00:21:01 Routing of packet from a L3VRF to the default L3VRF destined to a leak prefix fails because of the default routing rules on Linux. > 0: from all lookup local > 1000: from all lookup [l3mdev-table] > 32766: from all lookup main > 32767: from all lookup default When the packet is received in the loopback interface, the local rules are checked without match, then the l3mdev-table says to route to the loopback. A routing loop occurs (TTL is decreasing). > 12:26:27.928748 ens37 In IP (tos 0x0, ttl 64, id 26402, offset 0, flags [DF], proto ICMP (1), length 84) > 10.0.0.2 > 10.0.1.2: ICMP echo request, id 47463, seq 1, length 64 > 12:26:27.928784 red Out IP (tos 0x0, ttl 63, id 26402, offset 0, flags [DF], proto ICMP (1), length 84) > 10.0.0.2 > 10.0.1.2: ICMP echo request, id 47463, seq 1, length 64 > 12:26:27.928797 ens38 Out IP (tos 0x0, ttl 63, id 26402, offset 0, flags [DF], proto ICMP (1), length 84) > 10.0.0.2 > 10.0.1.2: ICMP echo request, id 47463, seq 1, length 64 Do not set the lo interface as a nexthop interface. Keep the real interface where possible. Fixes: db7cf73 ("bgpd: fix interface on leaks from redistribute connected") Fixes: 067fbab ("bgpd: fix interface on leaks from network statement") Fixes: 8a02d9f ("bgpd: Set nh ifindex to VRF's interface, not the real") Fixes: #15909 Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com> (cherry picked from commit 31fc89b)
Description
Trying a simple local BGP route-leak via VPN between a VRF and default VRF (GRT) leads to time exceeded in-transit because from VRF to GRT all routes points to dev
lo
.Version
How to reproduce
Using BGP import router between GRT and a VRF and we will have something like this:
And in mgmt:
Expected behavior
Should I have a connection between the VRF and default VRF (GRT) and not a L3 loop.
Actual behavior
A Layer 3 loop apears on
dev lo
:A simple ping from default network to mgmt leads to this:
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: