Question on how the CCM handles the use of the K3s --with-node-id parameter. #791
Labels
lifecycle/rotten
Denotes an issue or PR that has aged beyond stale and will be auto-closed.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
K3s has a --with-node-id parameter. This parameter adds ( at the end of the worker name ) a unique 8 character string. E.g.
test-test-worker-129-0c98138b
. Where `` is what the--with-node-id
code implementation adds to the `Kubernetes node name`.Our use-case is that we have auto scaling in play and sometimes when downscaling occurs remnants of the old node are left. More specifically the
k3s node-password.k3s Kind: Secret
. For more see How Agent Node Registration Works.And then when upscale occurs the same worker name might be used again - then without the
--with-node-id
parameter, this would cause the upscaled worker to NOT be allowed to join the cluster because thesecret data
in the*.node-password.k3s Kind: Secret
does not match.How this clashes with the CCM
However, when the
Kubernetes node name
and the name of thevirtual machine
on the underlyingHCI
in this case theGoogle Compute Engine
does not match. One will see/experience the following in the logs of thecloud-controller-manager DaemonSet Pods
:One fix would of course be to "just" name the
VM on the HCI
to match the name of theKubernetes node
. However, this is a somewhat complex logical flow to implement at the bootstrapping side of things. Because ( well of course ) the VM comes up before K3s, and K3s is the entity generating the id ... so the id would have to be fetched and the VM would have to renamed after K3 is up << to me that reads like a pro-longed journey in the weeds.The question
Is there a way to "tell" the
CCM
to use a label, thehostname
or some other method, e.g. the meta-data server on the vm, instead of "just" using the name of theKubernetes Node
, when it looks/queries/does its thing on the HCI. To ensure load-balancer, gce instances and what not?If that's possible then we wouldn't have this issue of the
Kubernetes node
being deleted and therefore never being able to successfully join the cluster.Thank you very much for any tips or pointers or of course preferably solutions that you spend your time providing me with 👍 .
Have a great day.
The text was updated successfully, but these errors were encountered: