This is technical and implementation background on the Clocker project, with more behind-the-scenes detail that couldn't be included in the Cloudsoft blog post Creating a Docker Cloud with Apache Brooklyn.
This post will show how Apache Brooklyn and Docker have been combined to create Clocker, a Docker cloud assembled from multiple Docker containers across multiple Docker hosts running on cloud virtual machines. Deploying and managing applications will include intelligent placement of containers, and automatic provisioning of new Docker hosts and containers as required.
Docker is an open platform for distributed applications for developers and sysadmins, and Apache Brooklyn is is an application deployment and runtime management platform that provides blueprints, or declarative specifications for your application components and their configurations.
Docker, Brooklyn, jclouds
The underlying mechanism used by Brooklyn for cloud VM provisioning and management is Apache Jclouds, a cloud API agnostic library that handles communication with both public and private providers to provision virtual machines and expose them as SSH accessible connections. This allows Brooklyn to execute commands on these machines over their SSH connections, as well as transferring files, in a secure fashion. Issues such as login credentials and keys are handled by jclouds and are fully configurable by Brooklyn.
The Docker architecture provides the end user with containers on a virtual machine that, assuming an SSH server is available, can be seen as simply more VMs. To make this fully transparent, a driver was developed for jclouds that allows provisioning of containers on a particular host with Docker installed, using the same API as any other cloud. This is fully described in AMP for Docker and is the foundation for the Docker cloud architecture.
Although it is certainly useful to have the ability to provision multiple containers on a single machine, this will eventually run into limits based on the physical machine being used. Even a large server with many GB (or even TB) of memory and many cores will eventually be unable to handle further containers. As an example, a 128GB 16 core server can be partitioned into only 32 virtual machines with a notional 0.5 core of CPU and 4GB memory, but with containers these limits are shared and each can potentially burst up to the maximum available on the underlying host machine.
Therefore, to increase the capacity we introduced the idea of a cluster of Docker hosts, each with a set of containers, but presented as a single Docker infrastructure encompassing all provisioned containers.
The three parts of this architecture are the Docker entities, which is the tree of
DockerContainer. These are Brooklyn entities that represent the Docker components, and provide sensors and effectors to monitor and control them, such as the current CPU or memory usage or uptime of a host, or the ability to pause and restart a container. The middle portion is a series of jclouds locations, which are Brooklyn wrappers around the main cloud provisioner of virtual machines for hosts, the host virtual machines themselves, the jclouds-docker provisioner accessing the Docker API on a host, and finally the SSHable Docker container as provided by the jclouds-docker driver.
This docker infrastructure will contain the intelligence and logic required to create new Docker hosts and containers as required for a particular application. To make operation as simple as possible, there should not need to be any special configuration provided in a Brooklyn blueprint being deployed to make use of the Docker facilities. To this end, we will use existing Brooklyn APIs and interfaces to present the containers as normal SSH accessible virtual machines, and control the Docker host installation using Brooklyn entities.
The Docker infrastructure is represented by a Brooklyn entity tree. The parent entity contains a
DockerHost entities. These hosts use virtual machines provisioned by a jclouds location, such as an Amazon EC2 or SoftLayer provider. Each host then has the Docker software installed and a basic Docker image created, with an SSH server available. The host entities themselves contain another
DynamicCluster this time containing
DockerContainer entities. These are the representations of the provisioned containers for deployed blueprint entities.
The Docker infrastructure entity is used to create a
DockerLocation which is a Brooklyn location that can provision virtual machines, in this case Docker containers. This is done by wrapping the parent jclouds provisioner to obtain new virtual machines in the background for the Docker hosts, and then creating a new jclouds location this time using the Docker API driver, for each host. When this jclouds-docker driver provisions a new Docker container, that is presented as a
DockerContainerLocation that exposes the SSH connection to the underlying container. Brooklyn poses the forwarded Docker port numbers from the container via a
SubnetTier entity, allowing access as though the container was available on the same public IP as the Docker host.
This sequence of wrapped entities and locations is visible to an end user deploying a blueprint as a new named location in the Brooklyn console, configured during the deployment of the
DockerInfrastructure entity. The default name is
my-docker-cloud and blueprints deployed to this location will have their entities started in Docker containers spread across the Docker cloud.
This Docker cloud allows us much more flexibility in deploying a containerised application than is possible with a single host. In particular, we now have a choice of which Docker host to use when adding a new container. This flexibility is accomplished through the use of a
NodePlacementStrategy configured either on a per blueprint basis, or as a default for the whole Docker infrastructure.
The simplest choices to make are to fill the Docker hosts to some maximum number of containers, and then provision a new host. This is a depth-first strategy, and will make the most efficient use of available infrastructure. If performance is more of a concern, then a breadth-first strategy can be used. This will attempt to balance container creation across all available hosts, starting at the least populated. This takes no account of actual host usage however, merely choosing on the basis of number of containers, and may not be sufficient when deployed entities have a wide range of CPU requirements. To make most efficient use of available CPU resources a strategy that checks the current CPU usage and creates the container in the host with the lowest percentage can be used.
The currently available strategies are summarised below, and more are in development:
BreadthFirstPlacementStrategy— Fill up hosts to a configured maximum number of containers before provisioning a new host.
DepthFirstPlacementStrategy— Balance container creation across the least populated hosts.
CpuUsagePlacementStrategy— Create containers in the host with the lowest CPU usage.
Although we are striving to make the deployment process for a Brooklyn blueprint as transparent as possible, it should be obvious that some capabilities require explicit configuration to control the Docker infrastructure, such as the placement strategies outlined above. Another feature available is the ability to select a particular Docker image for a container, whether by image ID directly or by specifying a Dockerfile to use. The default Dockerfile simply starts an SSH daemon, and an image is created using this for each Docker host.
If a configuration key is added to an entity specifying either a Dockerfile or image ID, this will be intercepted by the
DockerHostLocation at the time the container is created. Dockerfiles are specified using a URL that can be found either on the classpath of the Brooklyn server or remotely over HTTP or other supported protocol. Although the infrastructure does not fully support the Docker image registries at present, when an image is created from a Dockerfile, it is made using a unique hash of the Dockerfile URL, allowing the image to be reused later when further entities are requested using that same Dockerfile. This gives a considerable speed-up to clustered deployments of many identical entities on the same host. The configuration is, of course, ignored when the blueprint is deployed to a non-Docker cloud location.
Additionally, Dockerfiles are used by Brooklyn as templates rather than immutable text files. This means that they can contain variables and placeholders that are interpolated using FreeMarker based on the entity being deployed. This capability allows a single Dockerfile to be used to control deployment of a wide variety of different applications, and gives Brooklyn scope to integrate more closely with the deployed Docker infrastructure.
Deploying a Docker Cloud
The following YAML blueprint shows the configuration of a Docker cloud deployed to the Amazon EC2 Europe region.
id: docker-cloud name: "Docker Cloud Infrastructure" origin: "https://github.com/brooklyncentral/clocker/" locations: - jclouds:aws-ec2:eu-west-1: hardwareId: m1.small services: - serviceType: brooklyn.entity.container.docker.DockerInfrastructure id: infrastructure brooklyn.config: docker.version: "1.0" entity.dynamicLocation.name: "my-docker-cloud" docker.host.securityGroup: "docker" docker.host.initial.size: 2 docker.container.cluster.maxSize: 4 docker.host.register: false docker.container.strategy: "brooklyn.location.docker.strategy.CpuUsagePlacementStrategy" docker.host.spec: $brooklyn:entitySpec: type: brooklyn.entity.container.docker.DockerHost brooklyn.config: start.timeout: 900 docker.host.nameFormat: "docker-%1$s" docker.container.spec: $brooklyn:entitySpec: type: brooklyn.entity.container.docker.DockerContainer brooklyn.config: docker.container.nameFormat: "docker-%2$d"
Once deployed, this will provision two virtual machines in the EC2 Dublin region, and install Docker 1.0 on each. These hosts are monitored for CPU usage and memory usage, which are made available as sensor data for Brooklyn policies. For a more complete tutorial including application deployment to a Docker cloud, see the Cloudsoft Creating a Docker Cloud with Apache Brooklyn post.
This is the first release of the Brooklyn integration with Docker and development is ongoing. Some of the features to look out for in the next releases will be:
- More placement strategies and policies for container management.
- Implementation of affinity and anti-affinity APIs and a simple DSL to control container placement based on deployed entities and applications, and allowing blueprints to specify their affinity preferences on a per-entity basis.
- Improvements in the jclouds-docker driver will allow more control over Docker container provisioning, such as CPU shares and memory allocation. Docker image repository integration will allow images to be pulled from centrally configured locations, and shared between different hosts in a Docker cloud.
- Adding integration with software-defined networking (SDN) services such as Open vSwitch or OpenContrail will allow isolation and control of container network traffic, as well as easier communication between containers on different hosts, by introducing a shared Docker VLAN.
- Seamless integration of all existing Brooklyn blueprints with the Docker infrastructure.
The Docker integration with Brooklyn allows application blueprints to be deployed to a cloud of Docker containers, spread out intelligently across Docker hosts on managed VMs in the cloud.
An application blueprint is able to treat the Docker infrastructure as though it is a homogenous set of virtual machines in a single location, and then Docker infrastructure can make intelligent decisions about container provisioning and placement to maximise resource usage and performance.
The code is available on GitHub under the Apache 2.0 license, so please fork and contribute by creating pull requests or opening issues.
Apache Brooklyn is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by Chip Childers. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.