kubernetes is a prerequisite. Disclaimer: some of these features are alpha, meaning that some backward incompatible changes may be released with the upcoming versions. Always refer to the release-notes when upgrading.
Taints and tolerations are Kubernetes features that ensure Pods are assigned to appropriate nodes. A taint is composed of a key or a key/value pair (the value is optional), and an effect. Three effects are available:
In some cases, the cluster administrator needs to tolerate taints for some reasons. A toleration is a Pod’s property. It’s composed of a tolerated taint’s key, an optional value, a toleration delay (infinite by default), and an operator. There are 2 types of operators:
The taints and tolerations introduce new behaviours in Kubernetes, such as tainting node with their NodeCondition. This allows to tolerate some states and to assign Pods even though the node is tainted with networkUnavailable, so as to diagnose or prepare a node deployment.
The scheduler is getting richer and richer, starting with the priority & preemption APIs that begin to move towards beta support. Indeed the critical pods will integrate directly with the priority API instead of relying on rescheduler and taints. Preemption will become more complete in the next version, managing Pods preemptions to assign DaemonSet when resources forbid it. To get there, the assignment will no longer be handled by the DaemonSet controller but directly by the kube-scheduler. This makes the kube-scheduler a critical component when starting a cluster with tools that deploy masters as DaemonSet, as early as the next version. The preemption management when several schedulers are in use will happen in version 1.11. Performance wise, the version 1.10 will embed the Predicates-ordering design. Short reminder, the scheduler algorithm operates a suite of predicates, a set of functions, to determine if a pod can be hosted on a node. The eligible nodes are then prioritised by a set of prioritisation functions to elect the best fitted node. The scheduler will define the predicates execution order, so as to execute first the higher restriction and less complex predicates. This will optimise the execution time, and if a predicate fails, the remainder of predicates will not be executed. This last part will be detailed further in a following article, to present the work done by Googler Bobby Salamat about it.
Kubernetes is becoming more modular and easier to extend with external contributions, such as CNI (Container Network Interface) and CRI (Container Runtime Interface) currently. It now becomes important to split Kubernetes’ core when it come to:
Ever since Kubernetes 1.8, the Cloud Controller Manager makes possible for any cloud provider to integrate with Kubernetes. This is a great improvement as they don’t have to contribute directly into Kubernetes code. Thus, the release pace is given by the provider and not by the Kubernetes community, which improves velocity and the variety of features proposed by the cloud providers. Some providers are already developing their own implementation: Rancher, Oracle, Alibaba, DigitalOcean or Cloudify.
This works by relying on a plugin mechanism, as any cloud provider implementing cloudprovider.Interface and registering to Kubernetes can be linked to the CCM binary. In the next releases, every cloud providers (including providers already supported by kubernetes/kubernetes) will implement this interface outside of the Kubernetes repository, making the project more modular. For instance, here is the roadmap discussed by the Azure community.
Launched in version 1.9 in alpha, CSI is a spec that allows any provider implementing the specification to provide a driver that allows Kubernetes to manage storage. CSI will go alpha in version 1.10. The pain point addressed by CSI is twofold. Like CCM, it enables externalization of storage drivers and makes it possible for providers to define their release pace. Secondly, CSI resolves the hard installation issue of the FlexVolume plugins. If you want to write your own driver, see here how to deploy it.
Events have always been an issue on Kubernetes, semantic wise and performance wise.
Currently, the whole semantic is embedded in a message, which hardens considerably the information extraction and the events requesting. Performance wise, there is room for improvement. For now, Kubernetes has a deduplication process to remove identical entries, which decreases the memory footprint of etcd, Kubernetes’ distributed key-value store. However, the number of requests to etcd remains an issue. The version 1.10 will introduce the new event processing logic which should release this pressure. The principle is simple, event deduplication and events updates will happen periodically, which will greatly decrease the number of api-server calls for writing requests to etcd. This part will be further detailed in a following article, to present the related work by Googler Marek Grabowski and Heptio’s Timothy St. Clair.
Kubernetes will make it possible to scale your Pods depending on custom metrics. This feature is now beta. An aggregation layer makes it possible to extend the api-server beyond the native APIs provided by Kubernetes. It does so by adding 2 APIs:
There are various use cases, and this allows to address each of their specific needs. This part will be detailed in a next article.
The need for deploying multiple clusters becomes more and more mainstream. Kube-fed offers to manage that across on-premise sites, various cloud providers or on different regions of a same hosting service, which enables a considerable flexibility. Resources are relatively mature, and several features were added for federation:
The Kubernetes eco-system is reaching a satisfying maturity level, and keeps improving to address the many needs of its users. The possibility to extend Kubernetes will offer the users a way to address some very specific use cases. The next articles will detail more thoroughly some of the features introduced in this article such as the HPA principles, the event management, etc.