Troubleshooting Kubernetes clusters

If a Kubernetes cluster fails, you can download its configuration file and the log files of its nodes for troubleshooting.

Additionally, you can get diagnostic information of Kubernetes services by running commands inside a Kubernetes virtual machine from the compute node this VM reside on. Kubernetes services that you need to check are the following:

  • kubelet is a node agent that works on each Kubernetes node. It ensures that containers described in pod specification files are healthy and running.
  • kube-proxy is a network proxy that runs on each Kubernetes node. It configures network rules that resolve connections to pods from within and without a Kubernetes cluster.
  • kube-apiserver is an API server that runs only on master nodes. It validates and configures data for the API objects, such as pods, services, replication controllers, and others. The API server also assigns pods to nodes and synchronizes pod information with the service configuration.

To download a Kubernetes configuration file

  1. Find out the ID of the required Kubernetes cluster:

    # vinfra service compute k8saas list
    +--------------------------------------+------+--------+
    | id                                   | name | status |
    +--------------------------------------+------+--------+
    | 834397b9-22d3-486d-afc7-5c0122d6735d | k8s1 | ERROR  |
    +--------------------------------------+------+--------+
  2. Print the configuration of the Kubernetes cluster to a file. For example, to download the kubeconfig of the k8s1 cluster to the file k8s1.kubeconfig, run:

    # vinfra service compute k8saas config 834397b9-22d3-486d-afc7-5c0122d6735d > k8s1.kubeconfig

To download logs of a Kubernetes node

  1. Find out the name of the required Kubernetes virtual machine:

    # vinfra service compute server list
    +--------------------------------------+---------------+--------+------------------------+---------------------+
    | id                                   | name          | status | host                   | networks            |
    +--------------------------------------+---------------+--------+------------------------+---------------------+
    | 18fb7436-f1fa-4859-99dd-284cef9edc54 | k8s1-node-0   | ACTIVE | node002.vstoragedomain | - public=10.10.10.2 |
    | 66bc8454-efb4-4263-a0e2-523fd8f15bda | k8s1-master-0 | ACTIVE | node001.vstoragedomain | - public=10.10.10.1 |
    +--------------------------------------+---------------+--------+------------------------+---------------------+
    
  2. Print the log of the Kubernetes VM to a file. For example, to download the log of the k8s1 master node to the file k8s1-master-0.log, run:

    # vinfra service compute server log k8s1-master-0 > k8s1-master-0.log

To run commands inside a Kubernetes node

  1. Find out the Kubernetes VM ID and the hostname of the node it runs on by listing all virtual machines in the compute cluster:

    # vinfra service compute server list
    +--------------------------------------+---------------+--------+------------------------+---------------------+
    | id                                   | name          | status | host                   | networks            |
    +--------------------------------------+---------------+--------+------------------------+---------------------+
    | 18fb7436-f1fa-4859-99dd-284cef9edc54 | k8s1-node-0   | ACTIVE | node002.vstoragedomain | - public=10.10.10.2 |
    | 66bc8454-efb4-4263-a0e2-523fd8f15bda | k8s1-master-0 | ACTIVE | node001.vstoragedomain | - public=10.10.10.1 |
    +--------------------------------------+---------------+--------+------------------------+---------------------+
    

    In this example, the Kubernetes master node has the ID 66bc8454-efb4-4263-a0e2-523fd8f15bda and resides on the node node001.

  2. Log in to the node that hosts the needed Kubernetes VM, and then inside the VM. For example:

    # ssh node001.vstoragedomain
    [root@node001 ~]# virsh x-exec 66bc8454-efb4-4263-a0e2-523fd8f15bda

Now, you can perform diagnostic checks inside the VM. For example, you may start with finding out what services have failed:

[root@k8s1-master-0 /]# systemctl list-units --failed
  UNIT               LOAD   ACTIVE SUB    DESCRIPTION
● kube-proxy.service loaded failed failed kube-proxy via Hyperkube

To show more details about the failed service, run:

[root@k8s1-master-0 /]# systemctl status kube-proxy
× kube-proxy.service - kube-proxy via Hyperkube
     Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Thu 2022-04-28 11:20:18 UTC; 2min 6s ago
    Process: 4603 ExecStartPre=/bin/mkdir -p /etc/kubernetes/ (code=exited, status=0/SUCCESS)
    Process: 4604 ExecStartPre=/usr/bin/podman rm kube-proxy (code=exited, status=1/FAILURE)
    Process: 4624 ExecStart=/bin/bash -c /usr/bin/podman run --name kube-proxy --log-opt path=/dev/null --privileged --net host --entrypoint /hyperkube --volume /e>
    Process: 115468 ExecStop=/usr/bin/podman stop kube-proxy (code=exited, status=0/SUCCESS)
   Main PID: 4624 (code=exited, status=2)
        CPU: 2min 6.180s
...