Question on how the CCM handles the use of the K3s --with-node-id parameter. #791

larssb · 2024-12-04T15:39:23Z

K3s has a --with-node-id parameter. This parameter adds ( at the end of the worker name ) a unique 8 character string. E.g. test-test-worker-129-0c98138b. Where `` is what the --with-node-id code implementation adds to the `Kubernetes node name`.

Our use-case is that we have auto scaling in play and sometimes when downscaling occurs remnants of the old node are left. More specifically the k3s node-password.k3s Kind: Secret. For more see How Agent Node Registration Works.
And then when upscale occurs the same worker name might be used again - then without the --with-node-id parameter, this would cause the upscaled worker to NOT be allowed to join the cluster because the secret data in the *.node-password.k3s Kind: Secret does not match.

How this clashes with the CCM

However, when the Kubernetes node name and the name of the virtual machine on the underlying HCI in this case the Google Compute Engine does not match. One will see/experience the following in the logs of the cloud-controller-manager DaemonSet Pods:

│ 2024-12-04T13:50:01.954074977Z cloud-controller-manager-kcs6b I1204 13:50:01.953928       1 event.go:389] │
│  "Event occurred" object="test-test-worker-129-538b4427" fieldPath="" kind="Node" apiVersion="" type="Nor │
│ mal" reason="DeletingNode" message="Deleting node test-test-worker-129-538b4427 because it does not exist │
│  in the cloud provider"                                                                                   │
│ 2024-12-04T13:56:12.093865547Z cloud-controller-manager-kcs6b I1204 13:56:12.093718       1 event.go:389] │
│  "Event occurred" object="test-test-worker-129-0c98138b" fieldPath="" kind="Node" apiVersion="" type="Nor │
│ mal" reason="DeletingNode" message="Deleting node test-test-worker-129-0c98138b because it does not exist │
│  in the cloud provider"

One fix would of course be to "just" name the VM on the HCI to match the name of the Kubernetes node. However, this is a somewhat complex logical flow to implement at the bootstrapping side of things. Because ( well of course ) the VM comes up before K3s, and K3s is the entity generating the id ... so the id would have to be fetched and the VM would have to renamed after K3 is up << to me that reads like a pro-longed journey in the weeds.

The question

Is there a way to "tell" the CCM to use a label, the hostname or some other method, e.g. the meta-data server on the vm, instead of "just" using the name of the Kubernetes Node, when it looks/queries/does its thing on the HCI. To ensure load-balancer, gce instances and what not?

If that's possible then we wouldn't have this issue of the Kubernetes node being deleted and therefore never being able to successfully join the cluster.

Thank you very much for any tips or pointers or of course preferably solutions that you spend your time providing me with 👍 .

Have a great day.

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-12-04T15:39:32Z

This issue is currently awaiting triage.

If the repository mantainers determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-triage-robot · 2025-03-04T15:41:17Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2025-04-03T15:44:20Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 4, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2025

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on how the CCM handles the use of the K3s --with-node-id parameter. #791

Question on how the CCM handles the use of the K3s --with-node-id parameter. #791

larssb commented Dec 4, 2024

k8s-ci-robot commented Dec 4, 2024

k8s-triage-robot commented Mar 4, 2025

k8s-triage-robot commented Apr 3, 2025

Question on how the CCM handles the use of the K3s --with-node-id parameter. #791

Question on how the CCM handles the use of the K3s --with-node-id parameter. #791

Comments

larssb commented Dec 4, 2024

How this clashes with the CCM

The question

k8s-ci-robot commented Dec 4, 2024

k8s-triage-robot commented Mar 4, 2025

k8s-triage-robot commented Apr 3, 2025