-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipcache: Introduce the ability to inherit CIDR prefixes #32578
ipcache: Introduce the ability to inherit CIDR prefixes #32578
Conversation
a8d8300
to
53185d7
Compare
/test |
!lbls.Has(labels.LabelIngress[labels.IDNameIngress]) { | ||
!lbls.Has(labels.LabelIngress[labels.IDNameIngress]) && | ||
!lbls.HasSource(labels.LabelSourceFQDN) && | ||
!lbls.HasSource(labels.LabelSourceCIDR) { | ||
cidrLabels := labels.GetCIDRLabels(prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: This case should not be hit anymore, even before this PR. But we still have unit tests relying on it. To avoid adding even more commits to this PR, I decided to not emit a "should not be reached" warning log message here and leave this for a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
53185d7
to
9574178
Compare
Dropped the last |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on
A note on the implementation: To find affected child prefixes, we
currently perform a linear scan over the metadata map. While this is a
O(n)
operation, finding CIDR parents in a trie could be done inO(log n)
, but at the moment we expectn
to be in hundreds to the low
thousands at most. For most real world cases, we therefore expect the
current implementation to perform just as well if not better than a
trie. Also note that the linear scan is skipped if the prefix is a
single IP and thus cannot contain any other prefixes besides itself.
What is part of n
where we expect it to be in the hundreds or low thousands? What if we add pod IPs into this as #31409 would do?
So that sentence was written with the status quo, where only CIDR prefixes (per node), node IPs (global) and DNS lookups (per node) are added to metadata. Of course, once we add pod IPs (global) to also use the async API, the order of magnitude changes and the statement might no longer be true. It's hard to say at what point an allocation-heavy but asymptotically more optimal data-structure like a trie outperforms a naive linear scan. Let me try to add some benchmarks to see how expensive the current approach is. |
This helper allows to check if a label set contains a label from a particular source. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This label source will superseed CIDR labels for identities generated by ToFQDN rules. Like CIDR labels, FQDN labels are node local identities. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit removes the direct access to `PrefixInfo` as it is not needed for the current caller in the node manager (which only requires the source). A subsequent commit will introduce additional labels to a prefix not found in the `PrefixInfo`. Therefore, to avoid callers observing incomplete information, we hide the direct access. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
There are no longer is access outside of the IPCache package. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
A IP prefix may have multiple representations. For example `192.168.0.0/24` can also be represented as `192.168.50.60/24` (a IPv4 prefix using a non-masked address) or `::ffff:192.168.0.0/24` (a IPv4-mapped IPv6 address which will use the IPv4 stack on the host). Those prefixes might look different, but they all represent exactly the same destination. To avoid issues where callers might provide metadata for a prefix in non-canonical form, this PR introduces a helper which translates every IP used for upserts, deletions and lookups into the canonical form before the metadata map is accessed. This commit does not fix any known bug and merely introduces this in the spirit of defensive programming. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
The comment was written is no longer valid, as CIDR policies now also funnel their prefixes through the IPCache metadata subsystem. However, the logic needs to remain, as other callers (e.g. the node manager) now can inject multiple prefixes that share the same identity. The code comment is updated to reflect that. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit introduces a new feature in the IPCache metadata subsystem: The ability for prefixes with local identities to inherit CIDR labels from encompassing prefixes. This is needed to implement cilium/design-cfps#17 where we want to introduce a new way to manage identities for IPs allowlisted by DNS lookups. Without going into too much detail, the key point of that CFP is that IPs returned by DNS lookups will be associated with a `fqdn:` label. Those IPs returned by a DNS lookup however might also be selected by a CIDR prefix, and thus might also optionally need a `cidr:` label if (but only if) a selecting CIDR policy is present. To give an example: If we perform a DNS lookup on `one.one.one.one` on a pod with DNS redirects enabled, we may observe IP `1.1.1.1` and thus associated that IP with a `fqdn:` label. If at the same time, we also have a separate CIDR policy allowing traffic to `1.0.0.0/8`, then that IP also needs the `cidr:1.0.0.0/8` label, to make it is selected by both the FQDN and the CIDR policy. To implement this, the IPCache metadata subsystem is slightly augmented: 1. Whenever metadata for a prefix is upserted or removed, we no longer enqueue only the prefix it self, but also all potentially affected prefixes or IPs contained by it. This allows `InjectLabels` to then down-propagate the changes to any child prefixes. 2. In `resolveIdentity` (called by `InjectLabels`), we check if there are any parent prefixes with a CIDR label that we want to inherit. At the moment, only local identities without a `reserved:` label may inherit a CIDR label from a parent prefix. This currently therefore only applies to identities containing `fqdn:` labels. Going forward, this infrastructure might however also be useful to implement non-K8s label sources. A note on the implementation: To find affected child prefixes, we currently perform a linear scan over the metadata map. While this is a `O(n)` operation, finding CIDR parents in a trie could be done in `O(log n)`, but at the moment we expect `n` to be in hundreds to the low thousands at most. For most real world cases, we therefore expect the current implementation to perform just as well if not better than a trie. Also note that the linear scan is skipped if the prefix is a single IP and thus cannot contain any other prefixes besides itself. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
9574178
to
8a8c0b0
Compare
This adds a benchmark testing the overhead of using a naive linear scan over a Go map vs using a bitlpm trie. The benchmark harness generates a number of child (leaf) prefixes, plus a number of parent (non-leaf) prefixes. To simulate somewhat realistic behavior, we don't allocate IPs randomly across the whole IP space, but rather from a contiguous block of IPs with a given sparseness factor (i.e. a factor of 1.0 will allocate every single IP out of the block, a factor of 0.1 will allocate only 10% of the available IP space). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
8a8c0b0
to
e6b3759
Compare
I've pushed the code for a benchmark comparing the Here are the average times it takes one invocation of
It's pretty clear that the the trie is In terms of of allocations performed, trie also seems roughly
As is, I would say the In my opinion, the status quo is acceptable. Once #31409 lands, I think the case for the trie becomes much more interesting (though the increased memory pressure needs to be investigated). And keep in mind, this cost only occurs whenever a CIDR policy is changed (and /32 CIDRs do not trigger this). It does not affect upserts from the FQDN subsystem, does not affect upserts from the node manager and does not upserts from CEPs once #31409 lands. Given these results, I would prefer to not block this PR on the overall worse performance and defer the trie backend as a performance optimization to later. But please let me know what you think! |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the benchmark! I agree with not blocking the PR. I was trying to understand just how much the impact was in case we faced issues with users and how much work it would be to move towards the trie as the mitigation.
This PR introduces a new feature in the IPCache metadata subsystem:
The ability for prefixes with local identities to inherit CIDR labels
from encompassing prefixes. This is needed to implement
cilium/design-cfps#17 where we want to introduce a new way to manage
identities for IPs allowlisted by DNS lookups. Without going into too
much detail, the key point of that CFP is that IPs returned by DNS
lookups will be associated with a
fqdn:
label.Those IPs returned by a DNS lookup however might also be selected by a
CIDR prefix, and thus might also optionally need a
cidr:
label if (butonly if) a selecting CIDR policy is present.
To give an example: If we perform a DNS lookup on
one.one.one.one
on apod with DNS redirects enabled, we may observe IP
1.1.1.1
and thusassociated that IP with a
fqdn:
label. If at the same time, we alsohave a separate CIDR policy allowing traffic to
1.0.0.0/8
, then thatIP also needs the
cidr:1.0.0.0/8
label, to make it is selected by boththe FQDN and the CIDR policy.
To implement this, the IPCache metadata subsystem is slightly augmented:
enqueue only the prefix it self, but also all potentially affected
prefixes or IPs contained by it. This allows
InjectLabels
to thendown-propagate the changes to any child prefixes.
resolveIdentity
(called byInjectLabels
), we check if thereare any parent prefixes with a CIDR label that we want to inherit.
At the moment, only local identities without a
reserved:
label mayinherit a CIDR label from a parent prefix. This currently therefore only
applies to identities containing
fqdn:
labels. Going forward, thisinfrastructure might however also be useful to implement non-K8s label
sources.