Making Kubernetes deployments highly available
If a node that hosts a Kubernetes pod fails or becomes unreachable over the network, the pod is stuck in a transitional state. In this case, the pod's persistent volumes are not automatically detached, and it prevents the pod redeployment on another worker node. To make your Kubernetes applications highly available, you need to enforce the pod termination in the event of node failure by adding rules to the pod deployment.
To terminate a stuck pod
Add the following lines to the spec
section of the deployment configuration file:
terminationGracePeriodSeconds: 0 tolerations: - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 2 - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 2
If the node's state changes to "NotReady" or "Unreachable", the pod will be automatically terminated in 2 seconds.
The entire YAML file of a deployment may look as follows:
apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: terminationGracePeriodSeconds: 0 tolerations: - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 2 - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 2 containers: - image: nginx imagePullPolicy: IfNotPresent name: nginx ports: - containerPort: 80 protocol: TCP volumeMounts: - mountPath: /var/lib/www/html name: mydisk volumes: - name: mydisk persistentVolumeClaim: claimName: mypvc
The manifest above describes the deployment nginx
with one pod that uses the persistent volume claim mypvc
and will be automatically terminated in 2 seconds in the event of node failure.