2016-08-29

My Journey Along the Canal

Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.
– Edsger W. Dijkstra

At Cloudsoft we have long been fans of the Calico Project from Metaswitch, and the news that it was being spun out into a separate entity, called Tigera was much appreciated. Tigera promptly took ownership of the CoreOS Flannel project, another SDN solution, and announced Canal, which would be a combination of Calico and Flannel, packaged for Kubernetes, Docker and Mesos. All of these products are designed to help you implement networking solutions for modern containerized applications. With this in mind, I was keen to try out Canal and see how it integrated with Kubernetes.

The Map is not the Territory

At present, however, the Canal repository in GitHub contains the modern equivalent of the Web 1.0 'Under Construction' GIF - a single README.md file. However on the Calico Users Slack channel I was helpfully pointed at a set of instructions on the CoreOS website that described a DIY approach to the problem. At its root, Canal is simply Flannel with VxLAN used for networking between VMs and containers, and Calico is used to manage the network policy. Both projects are available as CNI plugins, and Kubernetes is able to use CNI to configure networking for its pods and their containers. The binaries, and example configuration files, are all available to download, so I set out to create a Kubernetes cluster.

If I was going to test the waters using the exact setup and configuration described in the online guide from CoreOS, this would have been a very short article! The happy path just requires you to provision some VMs and follow the instructions. However, I am interested in building something more generic and flexible to use as the basis for an AMP blueprint to run in a production environment, so a little more investigation is required.

The first issue I ran into was one of versioning. The world of containers moves quite fast, and yesterdays hot new release is todays legacy cruft. I decided that the sensible thing would be to use the most recent stable release for each component, however the instructions available often used combinations of versions that were now outdated, with the latest releases offering new and incompatible configuration options and flags. Additionally, many of the guides that can be downloaded assume that you will be using exactly the same set of features, add-ons and operating system version. For example, there are many explanations that use Flannel as a stand-alone SDN and many that show how to use Calico on its own, but only the CoreOS instructions explained how to use the two in concert. Of course, the CoreOS team want you to use the rest of their product offerings, so their guide helpfully describes how to deploy using Rocket, rather than Docker, and assumes your VM will be running CoreOS.

I was constrained to the fairly outdated CentOS 7 for the machines I would be deploying to, and decided on: Kubernetes 1.3.5; Flannel 0.6.0 (the release of which is in a different format and at a differently configured URL to all other Flannel releases, which caused many problems until it was discovered); Calico 0.21.0; CNI 0.3.0; Calico CNI 1.3.1; and Docker 1.12.1. Several of these components, such as the CNI release, are not mentioned in many guides, often assuming that dependencies will be downloaded via a package manager. However, the Yum repositories are several point releases behind on Kubernetes so this was not an option. But, once the software had been decided on, it was simple enough to download and use. Kubernetes deserves a special note here, since it is offered as a TGZ archive on the GitHub release page. This file is 1.4 GiB in size, however, and when running the scripts inside it, they will in turn re-download the archive, extract it again, and then extract another archive inside it, to obtain the required binaries! A little searching and a review of other Kubernetes install scripts revealed that Google provides an alternative download location, where binaries for each service can be found for most combinations of version, operating system and architecture.

Joining the DOCKER_OPTS

Once the correct versions of the services are safely copied to disk on the machines I will be using for my cluster, they must be configured properly. Here again, there are many different ways to accomplish the task. The scripts in many of the online guides give a good start, and usually configure a service file for Systemd that will start the process. First, though, the service files must be deciphered and reviewed to determine what arguments and environment variables are being used to configure the daemon processes. This can be a surprisingly complex job, with some implementations using several layers of indirection, creating environment variables that are used to set other environment variables in an environment file that is read by the actual service file itself. The service definitions often do things that I find unacceptable in a production environment, such as downloading the service binary from a repository during startup. There is also a choice to be made as to the mechanism used to run the services. Often Docker is used to host them, requiring you to rely on another indirection mechanism and set of custom variables for configuration.

I decided to create a series of Systemd service files, using my preferred set of arguments as determined from the available examples and the Kubernetes documentation. These were augmented by some supporting code in a script that performed other tasks such saving Flannel configuration to the etcd server and setting some basic iptables rules. This is by no means the end-stage of the process - the configurations and settings are to be used to create a blueprint for a cluster that can be used to run production workloads. For this part of the project, however, I was simply interested in getting Kubernetes up and running with the Canal services providing networking. This final stage proved the most elusive; getting a working Kubernetes cluster is mostly just a case of starting the necessary services and making sure they know where the master API server and the etcd cluster endpoints are. The moving parts and options for networking make it slightly more complex.

The concepts involved in Kubernetes networking can also be confusing at first glance. For example, Pods use virtual, non-routable IP addresses that are assigned to application endpoints. These are actually forwarded using a series of iptables rules, but with Kubernetes, Docker and Flannel all wanting to play here, deciphering the applicable rules can be impenetrable. Since Canal only uses the networking policy features of Calico, it is not immediately obvious that you must disable Calico networking when running the calico-node. The setting of the environment variable that achieves this is also hidden away without comment in a startup script.

The Journey is the Reward

The trek from new VMs to a running Kubernetes cluster with Canal was much longer that it should have been. Obviously there are options such as MiniKube, which allow you to get up and running very quickly, and start testing applications. But it is not a production cluster, nor is it meant to be. Of course, neither are the scripts in the online installation guides or the Kubernetes repository. A production deployment requires more than I have described here, such as TLS security and integration with CA servers, access to local registries for images, resilient management clusters with load-balancing and fail-over, monitoring and log management, and scalability. These are features that the AMP blueprint for Kubernetes will have to provide, using this as a starting point.

Resources

There were many guides and articles that I reviewed while building my Canal powered Kubernetes cluster. Some of the most useful ones are listed below:

  1. Kubernetes the Hard Way by Kelsey Hightower.
  2. CoreOS + Kubernetes Step By Step from the CoreOS site.
  3. Creating a Custom Cluster from Scratch in the Kubernetes documentation.
  4. Configuring flannel for container networking in the CoreOS Flannel documentation.

To find out more about the various technologies, visit their websites:

I fully appreciate the irony involved in the fact that this article does not contain any scripts or instructions that fix any of the issues I had when deploying Kubernetes and Canal. However, the point of this is more to show that creating anything more than a simple toy cluster is non-trivial, and adding Canal into the mix makes it even harder. But, the benefits of having both Flannel and Calico available as networking plugins for Kubernetes mean that it is worthwhile investing time into making this easy. And, of course, I look forward to the release of the production version of Canal from Tigera...

Look out for an article on the Cloudsoft blog that will discuss how these configured services were transformed into AMP blueprints as part of the Cloudsoft Container Service. These AMP blueprints can be used to make repeated and reproducible production Kubernetes cluster deployments with Canal, and AMP can then be used to deploy your own applications there.