Shutting down the worker nodes and master nodes
Procedure
-
Create a temporary auth information in case that OAuth feature is not working.
This account is used by the user during the startup process once all the nodes are up. In case the user is not able to connect with the default
kubeadmin
account, thetmp-admin
account can be used for logging in to the cluster.- Create a temporary service admin.
oc create sa tmp-admin -n default
- Bind a cluster admin role to the
tmp-admin
user.oc adm policy add-cluster-role-to-user cluster-admin -z tmp-admin -n default
- Get the temporary token for the
tmp-admin
user.token=$(oc get secret -o jsonpath="{.data.token}" $(oc get sa tmp-admin -o yaml -n default | grep " tmp-admin-token-" | awk '{ print $3 }') -n default | base64 -d)
- Verify that token has been generated.
echo $token Sample: eyJhbGciOi...
- Generate a
kubeconfig
file for temporary admin (tmp-admin).env KUBECONFIG=$(pwd)/tmpadmin-kubeconfig oc login --token=$token https://api.<your domain>:6443/
- Create a temporary service admin.
-
To make the worker nodes unschedulable and evict the pods, perform the following substeps:
- Select a worker node in the cluster and make it unscheduled using the following command:
oc adm cordon < worker-node-n >
- Drain the node.
oc adm drain <worker-node> --force --ignore-daemonsets --delete-local-data
NOTE:When the last worker node is drained, ignore the following error messages. These error messages are expected when the last node is drained, as other worker nodes are already drained. Press CTRL C to exit the command and return to prompt.oc adm drain worker-01.ocp4.qac.com --force --ignore-daemonsets --delete-local-data node/worker-01.ocp4.qac.com already cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/hpe-csi-node-ppkg4, openshift-cluster-node-tuning-operator/tuned-5ccg6, openshift-dns/dns-default-7cf2g, openshift-image-registry/node-ca-dtglx, openshift-machine-config-operator/machine-config-daemon-xhlbz, openshift-monitoring/node-exporter-gqfd8, openshift-multus/multus-ns8zb, openshift-sdn/ovs-s4jjm, openshift-sdn/sdn-bfddj; deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: sanity/simple evicting pod "simple" evicting pod "alertmanager-main-1" evicting pod "alertmanager-main-0" evicting pod "csp-service-7bcb94744d-w7g8l" evicting pod "alertmanager-main-2" evicting pod "grafana-56df66bfd5-784lk" evicting pod "kube-state-metrics-77b94cbbff-nh8cz" evicting pod "openshift-state-metrics-76bdf54b5f-79q9m" evicting pod "prometheus-adapter-548f8c485d-8tvgz" evicting pod "hpe-csi-controller-8f4485ccb-zsqw9" evicting pod "prometheus-adapter-548f8c485d-h6hpk" evicting pod "benchmark-operator-fd9997496-x4zwp" evicting pod "prometheus-k8s-0" evicting pod "prometheus-k8s-1" evicting pod "telemeter-client-6f8fcdf6d7-trgnp" evicting pod "image-registry-6bb8dc7445-hm4pw" evicting pod "router-default-7f748fbd5f-n96g2" evicting pod "certified-operators-5db975968d-4bnlf" evicting pod "community-operators-86dc5b948c-k7cxq" evicting pod "redhat-operators-544fc4b9b5-cd8c2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod "router-default-7f748fbd5f-n96g2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod "router-default-7f748fbd5f-n96g2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod "router-default-7f748fbd5f-n96g2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod "router-default-7f748fbd5f-n96g2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod "router-default-7f748fbd5f-n96g2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod "router-default-7f748fbd5f-n96g2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. pod/grafana-56df66bfd5-784lk evicted pod/openshift-state-metrics-76bdf54b5f-79q9m evicted pod/alertmanager-main-2 evicted evicting pod "router-default-7f748fbd5f-n96g2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. pod/telemeter-client-6f8fcdf6d7-trgnp evicted pod/hpe-csi-controller-8f4485ccb-zsqw9 evicted pod/benchmark-operator-fd9997496-x4zwp evicted pod/prometheus-adapter-548f8c485d-8tvgz evicted pod/image-registry-6bb8dc7445-hm4pw evicted pod/prometheus-k8s-0 evicted pod/certified-operators-5db975968d-4bnlf evicted pod/simple evicted evicting pod "router-default-7f748fbd5f-n96g2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. pod/alertmanager-main-1 evicted pod/kube-state-metrics-77b94cbbff-nh8cz evicted pod/community-operators-86dc5b948c-k7cxq evicted pod/redhat-operators-544fc4b9b5-cd8c2 evicted pod/alertmanager-main-0 evicted pod/prometheus-adapter-548f8c485d-h6hpk evicted pod/prometheus-k8s-1 evicted evicting pod "router-default-7f748fbd5f-n96g2" error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
NOTE:Repeat steps 2a and 2b for all the worker nodes in the cluster. Ensure you have shutdown all the worker nodes in the cluster before proceed to shutting down master nodes.
- Select a worker node in the cluster and make it unscheduled using the following command:
-
Shut down all worker nodes.
workers=($(oc get nodes -l node-role.kubernetes.io/worker=,node-role.kubernetes.io/infra!= --no-headers -o custom-columns=CONTAINER:.metadata.name)) for worker in ${workers[@]} do echo "==== Shutdown $worker ====" ssh core@$worker sudo shutdown -h now done
NOTE:Ensure to shut down all the worker nodes before proceeding to master nodes. User can re-verify the worker nodes from the ILO console.
-
Stop static pods on each master nodes and shut down all master nodes.
- Stop static pods on each master nodes.
masters=($(oc get nodes -l node-role.kubernetes.io/master --no-headers -o custom-columns=CONTAINER:.metadata.name)) for master in ${masters[@]} do echo "==== Stop static pods on $master ====" ssh core@$master 'sudo mkdir -p /etc/kubernetes/manifests.stop && sudo mv -v $(ls /etc/kubernetes/manifests/*) /etc/kubernetes/manifests.stop' while :; do ssh core@$master sudo crictl ps | grep -v -e operator -e cert | grep -qw -e etcd-member -e kube-apiserver-[0-9]* -e kube-controller-manager-[0-9]* -e scheduler || break sleep 5 done done
- Shut down all master nodes.
for master in ${masters[@]} do echo "==== Shutdown $master ====" ssh core@$master sudo shutdown -h now done
- Stop static pods on each master nodes.
-
Shut down all the storage Nimble nodes.
- In the NimbleOS GUI, go to
Administration >
Shut down.
NOTE:
Make sure all hosts are disconnected from the array to avoid unnecessary data service outage.
- Click Shut Down Array.
- Enter the administrator password and click Shut down.
- In the NimbleOS GUI, go to
Administration >
Shut down.
-
Shut down the registry VM (applicable only for disconnected mode of deployment).
- If the VM is created using
virt-manager
, run the following command:virsh shutdown <registry_vm_name>
OR
- Log in to the registry VM and run the following command:
sudo shutdown
- Shut down all the infrastructure services like DHCP server, DNS, load balancer, and so on.