Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FG:InPlacePodVerticalScaling] Add additional E2E Burstable Pod QoS test cases for pods without resource limits #124712

Open
jkyros opened this issue May 6, 2024 · 1 comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@jkyros
Copy link

jkyros commented May 6, 2024

What would you like to be added?

It would be really great if InPlacePodVerticalScaling could in-place resize cpu on containers that have requests only (no limits).

Currently, InPlacePodVerticalScaling cannot successfully resize cpu against containers without limits. e.g. given this pod:

apiVersion: v1
kind: Pod
metadata:
  name: sleeper
  namespace: default
spec:
  containers:
  - args:
    - -c
    - sleep infinity
    command:
    - /bin/sh
    image: registry.k8s.io/ubuntu-slim:0.1
    imagePullPolicy: IfNotPresent
    name: sleeper
    resources:
      requests:
        cpu: 25m
        memory: 100Mi

If you change cpu: 25m to cpu: 50m, it will get stuck in resize: InProgress forever:

  containerStatuses:
  - allocatedResources:
      cpu: 50m
      memory: 100Mi
    ...
    resources:
      requests:
        cpu: 25m
        memory: 100Mi
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-05-04T00:11:16Z"
  ...
  qosClass: Burstable
  resize: InProgress

Why doesn't in-place work now without limits:

  • This looks like a result of this kubelet code here not calculating the update if there are no limits (as in, if you do something naive like this it works, but probably has other consequences and I don't have enough background here to propose that as a proper fix )
  • Memory looks like it "works" without limits, but it's special and...doesn't really

What additional tests do we need:

  • The current test suite includes some Burstable cases, but does not cover all corners, specifically:
    • in-place increasing/reducing CPU requests on containers without limits
    • in-place removing limits from a container that previously had them
  • Minimally could look something like this: 7295341

Additional background:

  • This was discussed in the sig-node meeting on Apr 9, 2024, where I was encouraged to inquire via Slack
  • Per that slack discussion with @vinaykul I was encouraged to open an issue to add additional test coverage so the "containers without limits" cases could be covered.
  • This isn't just "please do this for me", I'm willing to do work here to get these corners covered if need be, but I don't want to be "in the way" 😄

/sig node

Why is this needed?

@jkyros jkyros added the kind/feature Categorizes issue or PR as related to a new feature. label May 6, 2024
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 6, 2024
@vinaykul
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests

3 participants