Making Kubernetes deployments highly available
If a node that hosts a Kubernetes pod fails or becomes unreachable over the network, the pod is stuck in a transitional state. In this case, the pod's persistent volumes are not automatically detached, and it prevents the pod redeployment on another worker node. To make your Kubernetes applications highly available, you need to enforce the pod termination in the event of node failure by adding rules to the pod deployment.
To terminate a stuck pod
Add the following lines to the spec section of the deployment configuration file:
terminationGracePeriodSeconds: 0 tolerations: - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 2 - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 2
If the node's state changes to "NotReady" or "Unreachable", the pod will be automatically terminated in 2 seconds.
The entire YAML file of a deployment may look as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 2
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 2
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80
protocol: TCP
volumeMounts:
- mountPath: /var/lib/www/html
name: mydisk
volumes:
- name: mydisk
persistentVolumeClaim:
claimName: mypvc
The manifest above describes the deployment nginx with one pod that uses the persistent volume claim mypvc and will be automatically terminated in 2 seconds in the event of node failure.