Shutting down the worker nodes and master nodes

Procedure

Create a temporary auth information in case that OAuth feature is not working.
This account is used by the user during the startup process once all the nodes are up. In case the user is not able to connect with the default kubeadmin account, the tmp-admin account can be used for logging in to the cluster.
1. Create a temporary service admin.
```
oc create sa tmp-admin -n default
```
2. Bind a cluster admin role to the tmp-admin user.
```
oc adm policy add-cluster-role-to-user cluster-admin -z tmp-admin -n default
```
3. Get the temporary token for the tmp-admin user.
```
token=$(oc get secret -o jsonpath="{.data.token}" $(oc get sa tmp-admin -o yaml -n default  | grep " tmp-admin-token-" | awk '{ print $3 }') -n default | base64 -d)
```
4. Verify that token has been generated.
```
echo $token

Sample:
      eyJhbGciOi...
```
5. Generate a kubeconfig file for temporary admin (tmp-admin).
```
env KUBECONFIG=$(pwd)/tmpadmin-kubeconfig oc login --token=$token https://api.<your domain>:6443/
```

To make the worker nodes unschedulable and evict the pods, perform the following substeps:

Select a worker node in the cluster and make it unscheduled using the following command:
```
oc adm cordon < worker-node-n >
```

Drain the node.

oc adm drain <worker-node> --force --ignore-daemonsets --delete-local-data

NOTE:

When the last worker node is drained, ignore the following error messages. These error messages are expected when the last node is drained, as other worker nodes are already drained. Press CTRL C to exit the command and return to prompt.

oc adm drain worker-01.ocp4.qac.com --force --ignore-daemonsets --delete-local-data
node/worker-01.ocp4.qac.com already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/hpe-csi-node-ppkg4, openshift-cluster-node-tuning-operator/tuned-5ccg6, openshift-dns/dns-default-7cf2g, openshift-image-registry/node-ca-dtglx, openshift-machine-config-operator/machine-config-daemon-xhlbz, openshift-monitoring/node-exporter-gqfd8, openshift-multus/multus-ns8zb, openshift-sdn/ovs-s4jjm, openshift-sdn/sdn-bfddj; deleting Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: sanity/simple
evicting pod "simple"
evicting pod "alertmanager-main-1"
evicting pod "alertmanager-main-0"
evicting pod "csp-service-7bcb94744d-w7g8l"
evicting pod "alertmanager-main-2"
evicting pod "grafana-56df66bfd5-784lk"
evicting pod "kube-state-metrics-77b94cbbff-nh8cz"
evicting pod "openshift-state-metrics-76bdf54b5f-79q9m"
evicting pod "prometheus-adapter-548f8c485d-8tvgz"
evicting pod "hpe-csi-controller-8f4485ccb-zsqw9"
evicting pod "prometheus-adapter-548f8c485d-h6hpk"
evicting pod "benchmark-operator-fd9997496-x4zwp"
evicting pod "prometheus-k8s-0"
evicting pod "prometheus-k8s-1"
evicting pod "telemeter-client-6f8fcdf6d7-trgnp"
evicting pod "image-registry-6bb8dc7445-hm4pw"
evicting pod "router-default-7f748fbd5f-n96g2"
evicting pod "certified-operators-5db975968d-4bnlf"
evicting pod "community-operators-86dc5b948c-k7cxq"
evicting pod "redhat-operators-544fc4b9b5-cd8c2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "router-default-7f748fbd5f-n96g2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "router-default-7f748fbd5f-n96g2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "router-default-7f748fbd5f-n96g2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "router-default-7f748fbd5f-n96g2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "router-default-7f748fbd5f-n96g2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "router-default-7f748fbd5f-n96g2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/grafana-56df66bfd5-784lk evicted
pod/openshift-state-metrics-76bdf54b5f-79q9m evicted
pod/alertmanager-main-2 evicted
evicting pod "router-default-7f748fbd5f-n96g2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/telemeter-client-6f8fcdf6d7-trgnp evicted
pod/hpe-csi-controller-8f4485ccb-zsqw9 evicted
pod/benchmark-operator-fd9997496-x4zwp evicted
pod/prometheus-adapter-548f8c485d-8tvgz evicted
pod/image-registry-6bb8dc7445-hm4pw evicted
pod/prometheus-k8s-0 evicted
pod/certified-operators-5db975968d-4bnlf evicted
pod/simple evicted
evicting pod "router-default-7f748fbd5f-n96g2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
pod/alertmanager-main-1 evicted
pod/kube-state-metrics-77b94cbbff-nh8cz evicted
pod/community-operators-86dc5b948c-k7cxq evicted
pod/redhat-operators-544fc4b9b5-cd8c2 evicted
pod/alertmanager-main-0 evicted
pod/prometheus-adapter-548f8c485d-h6hpk evicted
pod/prometheus-k8s-1 evicted
evicting pod "router-default-7f748fbd5f-n96g2"
error when evicting pod "router-default-7f748fbd5f-n96g2" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

NOTE:

Repeat steps 2a and 2b for all the worker nodes in the cluster. Ensure you have shutdown all the worker nodes in the cluster before proceed to shutting down master nodes.

Shut down all worker nodes.

workers=($(oc get nodes -l node-role.kubernetes.io/worker=,node-role.kubernetes.io/infra!= --no-headers -o custom-columns=CONTAINER:.metadata.name))
      for worker in ${workers[@]}
      do
        echo "==== Shutdown $worker ====" 
        ssh core@$worker sudo shutdown -h now
      done

NOTE:

Ensure to shut down all the worker nodes before proceeding to master nodes. User can re-verify the worker nodes from the ILO console.

Stop static pods on each master nodes and shut down all master nodes.

Stop static pods on each master nodes.

masters=($(oc get nodes -l node-role.kubernetes.io/master --no-headers -o custom-columns=CONTAINER:.metadata.name))
      for master in ${masters[@]}
      do
        echo "==== Stop static pods on $master ====" 
        ssh core@$master 'sudo mkdir -p /etc/kubernetes/manifests.stop && sudo mv -v $(ls /etc/kubernetes/manifests/*) /etc/kubernetes/manifests.stop'
        while :;
        do
          ssh core@$master sudo crictl ps | grep -v -e operator -e cert | grep -qw -e etcd-member -e kube-apiserver-[0-9]* -e kube-controller-manager-[0-9]* -e scheduler || break
          sleep 5
        done
      done

Shut down all master nodes.

for master in ${masters[@]}
do
    echo "==== Shutdown $master ===="
    ssh core@$master sudo shutdown -h now
done

Shut down all the storage Nimble nodes.
1. In the NimbleOS GUI, go to Administration > Shut down.
  NOTE:
  Make sure all hosts are disconnected from the array to avoid unnecessary data service outage.
2. Click Shut Down Array.
3. Enter the administrator password and click Shut down.
Shut down the registry VM (applicable only for disconnected mode of deployment).
- If the VM is created using virt-manager, run the following command:
```
virsh shutdown <registry_vm_name>
```
  OR
- Log in to the registry VM and run the following command:
```
sudo shutdown 
```
Shut down all the infrastructure services like DHCP server, DNS, load balancer, and so on.