Kubernetes - Overview

Kubernetes is an open source container orchestration framework that depends on a container runtime such as docker or containerd.

Namespace

Every resource created exists in a namespace. By default, it is the namespace named default. It is also possible to create new namespaces. If you have used docker swarm before, you can think of a namespace as a stack.

apiVersion: v1
kind: Namespace
metadata:
    name: development
    labels:
        name: development

Pod

A pod is a set of one or more container sharing the same namespace and volumes.

In Kubernetes, container are not directly used. Instead, so-called pods are created, which control the container. That has the benefit of being able to keep the same pod alive while the container(s) may restart or change.

A pod is created from a template such as the one below. Usually, there is a single container per pod, but occasionally tightly coupled containers are put in the same pod.

template:
  spec:
    containers:
    - name: hello
      image: hello-world
    restartPolicy: OnFailure

Workload

Usually, pod templates are used by so workload controller. They are responsible for managing the life cycle of a workload such as a job and deployment. Below is a workload definition. The corresponding controller will make sure that the state of the system matches the definition.

apiVersion: batch/v1
kind: Job
metadata:
  name: hello-world
spec:
  template:
    spec:
      containers:
      - name: hello
        image: hello-world
      restartPolicy: OnFailure

Depending on the type of application and pod configuration, different workloads can be used. Below are the default workloads. It is also possible to use custom workloads with specialized behavior.

  • Deployment and ReplicaSet (replacing the legacy resource ReplicationController). Deployment is a good fit for managing a stateless application workload on your cluster, where any Pod in the Deployment is interchangeable and can be replaced if needed.
  • StatefulSet lets you run one or more related Pods that do track state somehow. For example, if your workload records data persistently, you can run a StatefulSet that matches each Pod with a PersistentVolume. Your code, running in the Pods for that StatefulSet, can replicate data to other Pods in the same StatefulSet to improve overall resilience.
  • DaemonSet defines Pods that provide node-local facilities. These might be fundamental to the operation of your cluster, such as a networking helper tool, or be part of an add-on. Every time you add a node to your cluster that matches the specification in a DaemonSet, the control plane schedules a Pod for that DaemonSet onto the new node.
  • Job and CronJob define tasks that run to completion and then stop. Jobs represent one-off tasks, whereas CronJobs recur according to a schedule.

Service

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: MyApp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376

Load Balancing

The default load balancing for services and pods is performed by the kube-proxy and happens on layer 4.

The chosen proxy mode for the kube-proxy determines the load balancing algorithm.

  • userspace mode chooses a backend via a round-robin algorithm.
  • iptables mode chooses a backend at random.
  • IPVS mode provides more options for balancing traffic to backend Pods; these are:
    • rr: round-robin
    • lc: least connection (smallest number of open connections)
    • dh: destination hashing
    • sh: source hashing
    • sed: shortest expected delay
    • nq: never queue

Service Discovery

Kubernetes provides DNS. Depending on from where the request is placed, a DNS query yields different results. For example, resources in the same namespace can find each other without their fully qualified domain name (FQDN).

Container in the same pod can find each other via loopback interface.

A pods' DNS entry has the following form.

pod-ip-address.my-namespace.pod.cluster-domain.example.

Pods created by a Deployment or DaemonSet exposed by a Service have the following DNS resolution.

pod-ip-address.deployment-name.my-namespace.svc.cluster-domain.example

Workloads such as deployments do not have a DNS name themselves. That's why most of the time they coupled with services if they need to be reachable for other services or external requests.

Services are resolved like the following.

<service-name>.<namespace-name>.svc.cluster.local

Pods use their own namespace by default, this means that, for example, when only querying for <service-name> it will resolve to the service bound to the same namespace as the pod making the DNS query.

This is possible because of the entry in each containers' /ect/resolve.conf that has the following form.

search <namespace>.svc.cluster.local svc.cluster.local cluster.local

Note: that not all DNS related tools will search by default. For example, to get the service IP without FQDN, from within a container, using dig, the +search flag has to be used.

dig +search <service-name>

Health Checks

By default, Kubernetes uses the process ID of the container to determine of a pod is alive and ready to accept requests. As long as all specified container has a corresponding process ID (PID), the pod is considered healthy.

Custom healthprobes, can be specified. For example, a ivenessProbe via HTTP.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: healthcheck-me
spec:
  template:
    metadata:
      labels:
        app: healthcheck-me
    spec:
      containers:
      - name: healthcheck-me
        image: localhost/checkme
        ivenessProbe:
          httpGet:
            path: /healthz
            port: 80
          initialDelaySeconds: 0
          periodSeconds: 10
          timeoutSeconds: 1
          failureThreshold: 3

Volumes

When a pod ceases to exist, Kubernetes destroys ephemeral volumes; however, Kubernetes does not destroy persistent volumes. For any kind of volume in a given pod, data is preserved across container restarts.

apiVersion: v1
kind: Pod
metadata:
  name: configmap-pod
spec:
  containers:
    - name: test
      image: busybox
      volumeMounts:
        - name: config-vol
          mountPath: /etc/config
  volumes:
    - name: config-vol
      configMap:
        name: log-config
        items:
          - key: log_level
            path: log_level

Types of volumes

Below are some of the most common types of volumes. There more types available though, for example, the big cloud provider AWS, GCP and Azure have their own volume type which provisions storage in the respect cloud platform.

Volume Type Description
configMap (ephemeral) A ConfigMap provides a way to inject configuration data into pods.
emptyDir (ephemeral) An emptyDir volume is first created when a Pod is assigned to a node, and exists as long as that Pod is running on that node.
hostPath A hostPath volume mounts a file or directory from the host node's filesystem into your Pod.
local A local volume represents a mounted local storage device such as a disk, partition or directory.
nfs An nfs volume allows an existing NFS (Network File System) share to be mounted into a Pod.
persistentVolumeClaim A persistentVolumeClaim volume is used to mount a PersistentVolume into a Pod.
secret (epehemral) A secret volume is used to pass sensitive information, such as passwords, to Pods.

27