Troubleshooting Kubernetes clusters
If a Kubernetes cluster fails, you can download its configuration file and the log files of its nodes for troubleshooting.
Additionally, you can get diagnostic information of Kubernetes services by running commands inside a Kubernetes virtual machine from the compute node this VM reside on. Kubernetes services that you need to check are the following:
kubelet
is a node agent that works on each Kubernetes node. It ensures that containers described in pod specification files are healthy and running.kube-proxy
is a network proxy that runs on each Kubernetes node. It configures network rules that resolve connections to pods from within and without a Kubernetes cluster.kube-apiserver
is an API server that runs only on master nodes. It validates and configures data for the API objects, such as pods, services, replication controllers, and others. The API server also assigns pods to nodes and synchronizes pod information with the service configuration.
To download a Kubernetes configuration file
-
Find out the ID of the required Kubernetes cluster:
# vinfra service compute k8saas list +--------------------------------------+------+--------+ | id | name | status | +--------------------------------------+------+--------+ | 834397b9-22d3-486d-afc7-5c0122d6735d | k8s1 | ERROR | +--------------------------------------+------+--------+
-
Print the configuration of the Kubernetes cluster to a file. For example, to download the kubeconfig of the
k8s1
cluster to the filek8s1.kubeconfig
, run:# vinfra service compute k8saas config 834397b9-22d3-486d-afc7-5c0122d6735d > k8s1.kubeconfig
To download logs of a Kubernetes node
-
Find out the name of the required Kubernetes virtual machine:
# vinfra service compute server list +--------------------------------------+---------------+--------+------------------------+---------------------+ | id | name | status | host | networks | +--------------------------------------+---------------+--------+------------------------+---------------------+ | 18fb7436-f1fa-4859-99dd-284cef9edc54 | k8s1-node-0 | ACTIVE | node002.vstoragedomain | - public=10.10.10.2 | | 66bc8454-efb4-4263-a0e2-523fd8f15bda | k8s1-master-0 | ACTIVE | node001.vstoragedomain | - public=10.10.10.1 | +--------------------------------------+---------------+--------+------------------------+---------------------+
-
Print the log of the Kubernetes VM to a file. For example, to download the log of the
k8s1
master node to the filek8s1-master-0.log
, run:# vinfra service compute server log k8s1-master-0 > k8s1-master-0.log
To run commands inside a Kubernetes node
-
Find out the Kubernetes VM ID and the hostname of the node it runs on by listing all virtual machines in the compute cluster:
# vinfra service compute server list +--------------------------------------+---------------+--------+------------------------+---------------------+ | id | name | status | host | networks | +--------------------------------------+---------------+--------+------------------------+---------------------+ | 18fb7436-f1fa-4859-99dd-284cef9edc54 | k8s1-node-0 | ACTIVE | node002.vstoragedomain | - public=10.10.10.2 | | 66bc8454-efb4-4263-a0e2-523fd8f15bda | k8s1-master-0 | ACTIVE | node001.vstoragedomain | - public=10.10.10.1 | +--------------------------------------+---------------+--------+------------------------+---------------------+
In this example, the Kubernetes master node has the ID
66bc8454-efb4-4263-a0e2-523fd8f15bda
and resides on the nodenode001
. -
Log in to the node that hosts the needed Kubernetes VM, and then inside the VM. For example:
# ssh node001.vstoragedomain [root@node001 ~]# virsh x-exec 66bc8454-efb4-4263-a0e2-523fd8f15bda
Now, you can perform diagnostic checks inside the VM. For example, you may start with finding out what services have failed:
[root@k8s1-master-0 /]# systemctl list-units --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● kube-proxy.service loaded failed failed kube-proxy via Hyperkube
To show more details about the failed service, run:
[root@k8s1-master-0 /]# systemctl status kube-proxy × kube-proxy.service - kube-proxy via Hyperkube Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2022-04-28 11:20:18 UTC; 2min 6s ago Process: 4603 ExecStartPre=/bin/mkdir -p /etc/kubernetes/ (code=exited, status=0/SUCCESS) Process: 4604 ExecStartPre=/usr/bin/podman rm kube-proxy (code=exited, status=1/FAILURE) Process: 4624 ExecStart=/bin/bash -c /usr/bin/podman run --name kube-proxy --log-opt path=/dev/null --privileged --net host --entrypoint /hyperkube --volume /e> Process: 115468 ExecStop=/usr/bin/podman stop kube-proxy (code=exited, status=0/SUCCESS) Main PID: 4624 (code=exited, status=2) CPU: 2min 6.180s ...