Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik v3.0 breaks existing cert-manager integrations #10702

Open
2 tasks done
sebadob opened this issue May 7, 2024 · 0 comments
Open
2 tasks done

Traefik v3.0 breaks existing cert-manager integrations #10702

sebadob opened this issue May 7, 2024 · 0 comments
Assignees
Labels
area/acme area/rules kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed.

Comments

@sebadob
Copy link

sebadob commented May 7, 2024

Welcome!

  • Yes, I've searched similar issues on GitHub and didn't find any.
  • Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What did you do?

I was using Traefik v2 for a long time together with cert-manager inside Kubernetes.

For my usual routes, I am using the IngressRoute CRD which works perfectly fine again after I followed the migration guide.
However, all my existing cert-manager integrations and automatic certificate renewals in the whole cluster stopped working silently. The acme-solvers created by the cert-manager are default Kubernetes Ingress and they have not changed at all. I just noticed this today after a certificate could not be renewed for a very long time and gut stuck.

The Ingress they create is nothing special and as mentioned this has not changed, I have not done any other updates to the cluster than Traefik v3.0 since this problem came up.

What did you see instead?

It seems that the Ingress route is simply ignored. I tried by removing all existing IngressRoutes in the same namespace and only leaving the auto-created Ingress for a testing certificate. I tried to manually debug the whole flow and see where it gets stuck and it was for sure Traefik v3.0.
I am using the Helm Chart to deploy Traefik and providers.kubernetesIngress.enabled is set to true and is the default value in there.

The auto-created spec looks like this:

spec:
  rules:
  - host: host.example.com
    http:
      paths:
      - backend:
          service:
            name: cm-acme-http-solver-qrg6m
            port:
              number: 8089
        path: /.well-known/acme-challenge/als2om6EN1LVdx548XZESnj9DkuHh3idOaexEcaFkmI
        pathType: ImplementationSpecific

I usually have https-redirects for each host name and as soon as I added the IngressRoute again, I got back the HTTP 301 from it, even though I should have gotten the acme challenge. But as mentioned even without this redirect route I alway received a 404 page not found.

I am using a very nasty workaround everywhere now so I can at least get my certificates renewed. This is simply ignoring the auto-created Ingres resources and I manually added the following IngressRoute definitions to at least make it working again:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: https-redirect
spec:
  entryPoints:
    - web
  routes:
    - match: Host(`host.example.com`) && !PathPrefix(`/.well-known/acme-challenge/`)
      kind: Rule
      middlewares:
        - name: https-only
          namespace: traefik
      services:
        - name: some-service
          port: 3000
    - match: Host(`host.example.com`) && PathPrefix(`/.well-known/acme-challenge/`)
      kind: Rule
      services:
        - name: acme-svc-workaround
          port: 8089
---
apiVersion: v1
kind: Service
metadata:
  name: acme-svc-workaround
spec:
  selector:
    acme.cert-manager.io/http01-solver: "true"
  ports:
    - name: http
      port: 8089
      targetPort: 8089

This solution is very brittle and tedious though.
I was very lucky that the failed certificate over night was on some staging environment instead of production.

What version of Traefik are you using?

Helm Chart version:
traefik-28.0.0

App-Version:
3.0.0

What is your environment & configuration?

Custom values.yaml:

globalArguments:
  - --global.checknewversion
  - --accesslog.fields.names.StartUTC=drop

deployment:
  kind: DaemonSet
  minReadySeconds: 3

podDisruptionBudget:
  enabled: true
  maxUnavailable: 2

ingressRoute:
  dashboard:
    enabled: true

providers:
  kubernetesCRD:
    enabled: true
    allowCrossNamespace: true
    allowExternalNameServices: true
    allowEmptyServices: true

env:
  - name: TZ
    value: Europe/Berlin

logs:
  general:
    level: INFO
  access:
    enabled: true
    bufferingSize: 100
    filters:
      statuscodes: "300-599"
      minduration: 1ms
    fields:
      headers:
        defaultmode: drop
        names:
          User-Agent: keep

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 512Mi

service:
  enabled: true
  type: LoadBalancer
  spec:
    externalTrafficPolicy: Local

ports:
  websecure:
    asDefault: true
    http3:
      enabled: true

podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
          - key: app
            operator: In
            values:
              - traefik
      topologyKey: failure-domain.beta.kubernetes.io/zone

If applicable, please paste the log output in DEBUG level

No response

@kevinpollet kevinpollet self-assigned this May 13, 2024
@rtribotte rtribotte assigned rtribotte and unassigned kevinpollet May 13, 2024
@rtribotte rtribotte added area/acme kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. area/rules and removed status/0-needs-triage labels May 13, 2024
@kevinpollet kevinpollet self-assigned this May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/acme area/rules kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed.
Projects
None yet
Development

No branches or pull requests

4 participants