Custom Sensors in Brooklyn

One of the features of Apache Brooklyn is the ability to retrieve data from a running entity using sensors. These sensors expose data from the entity as attributes containing information like queue depth for a message broker, latency for an HTTP server or CPU usage for a Docker host. The data can also be enriched or aggregated to produce sums or moving averages across multiple entities in a cluster, and used as input to policies to drive scaling and resilience mechanisms. The sources for these sensors are varied, and encompass JMX attributes for Java applications, fields from XML or JSON documents returned by RESTful APIs, parsed output from shell commands and many more.

Sometimes the entities provided by Brooklyn do not have the particular piece of data you need for a policy exposed as a sensor. In these circumstances it is possible to dynamically add sensors, either programmatically in a Java entity class that extends the default Brooklyn code, or in the YAML blueprint used to describe the application. There are three different categories of sensor that can be added, SSH command output, JSON data from an HTTP URL or JMX attributes from a Java entity. each of these are configured differently, although there is some commonality.

First, we will look at an example using the SshCommandSensor class to add sensors driven by the output of shell commands invoked over SSH to the virtual machine running the software process being managed. The following blueprint shows a TomcatServer entity and a brooklyn.initializers section adding a sensor to it.

name: Tomcat SSH Sensor
- serviceType: brooklyn.entity.webapp.tomcat.TomcatServer
  name: Tomcat
  location: jclouds:aws-ec2:eu-west-1
  - type: brooklyn.entity.software.ssh.SshCommandSensor
      name: tomcat.cpustats
      command: "mpstat | grep all"
    pre.install.command: "sudo apt-get install -y sysstat"

The output of this sensor can be seen in the following screenshot. The SSH command mpstat | grep all is being executed to generate information on CPU usage, which is the published as the tomcat.cpustats sensor.

Problems using SSH

However when trying to perform some calculations on the data from mpstat, using the following YAML fragment to add another sensor, my colleague Richard discovered that the code did not perform as expected. Although the SSH commands appeared to be being executed as expected, the sensor data was always empty.

  - type: brooklyn.entity.software.ssh.SshCommandSensor
      name: tomcat.cpustats.broken
      command: >
        mpstat |
        awk '$2==\"all\" { print $3+$4+$5+$6+$7+$8+$9+$10 }'

After some investigation, he located the cause of this problem, which is obscure enough that I have decided to document it here for future reference. I will quote from Richard's email on the subject:

The problem is that the LANG environment variable is different between me running SSH in a terminal to test out potential commands, and when Brooklyn is opening SSH sessions to run its commands.

When I was experimenting with finding the right command to run, I would SSH to a Linux box and run variations on my command until I came up with a working version. This SSH session would inherit my workstation's LANG of en_GB.UTF-8. When I ran mpstat, here is a typical line of output:

16:13:15    all   5.40   0.05   2.18   0.52   0.09   0.03   0.00   0.00  91.74

Having got a working command, I plugged it into my blueprint and let Brooklyn run the command. Unfortunately, Brooklyn (probably) does not set the LANG environment variable, and this particular Linux machine chose a default of en_US.UTF-8. When it ran mpstat, here is the equivalent line of output:

04:13:11 PM all   5.40   0.05   2.18   0.52   0.09   0.03   0.00   0.00  91.74

Notice that the date has changed from 24-hour form, to 12-hour with AM/PM suffix. Also note that there is a space before the "PM" suffix - causing all of my awk field numbers to now be off-by-one. D'oh!

So if the output of a command you intend to use with awk, perl or even just cut includes potentially locale-specific items, explicitly set the LANG variable to prevent any surprises in formatting that will throw off your parsing routines. In this case, modifying the command value in the blueprint to set the locale explicitly is achieved by setting LANG to en_US.UTF-8 in the blueprint.

  - type: brooklyn.entity.software.ssh.SshCommandSensor
      name: cpu.load
      command: >
        LANG=en_US.UTF-8 mpstat |
        awk '$3==\"all\" {print $4+$5+$6+$7+$8+$9+$10+$11}'

Other examples of SSH sensors might be returning the contents of various files in /proc or executing an administration command for an entity to return information. The sensor can be configured to poll at specific intervals, and the output can be coerced to different types, as required. To chnage the poll frequency, set the period configuration key to the time required, either in milliseconds or using a suffix to indicate minutes or seconds, for example 10m for ten minutes or 5s for five seconds. The sensor type is set using the type configuration key, and can be either the name of a primitive type or a fully qualified Java class name. The following example returns the available disk space as the disk.available integer sensor, every five minutes.

  - type: brooklyn.entity.software.ssh.SshCommandSensor
      name: disk.available
      command: "df / | grep disk1 | cut -d\  -f4"
      period: 5m
      targetType: Integer

Other Sensor Types

The dynamic sensor addition in Brooklyn is not limited to running SSH commands, and there are currently two other mechanisms available. If you have an entity that is running a Java program with JMX enabled, Brooklyn is able to retrieve attributes and convert them into sensor data. Using the JmxAttributeSensor in the same way as for SSH sensors, we can add a dynamic JMX sensor. For example, this YAML snippet adds the LoadedClassCount attribute from the java.lang:type=ClassLoading JMX object as the loaded.classes sensor, refresing every thirty seconds.

  - type: brooklyn.entity.software.java.JmxAttributeSensor
      name: loaded.classes
      objectName: "java.lang:type=ClassLoading"
      attribute: "LoadedClassCount"
      targetType: Integer
      period: 30s

This JMX object is part of the JVM management information beans, and should be available on every Java application. To access application specific attributes, Brooklyn must already be able to access JMX data on the entity, which will normally be the case if the entity implements the UsesJmx interface. This means that Brooklyn will be able to determine the JMX and RMI or JMXMP ports that are being used, as well as any authentication details that are required, and these are re-used when adding sensors like this. The same period and type keys are used to configure the sensor polling and return value coercion, just as for SSH sensors.

Finally, it is possible to access and parse JSON data froman HTTP based REST API on an entity. This uses the JSONPath expression language to extract parts of a JSON document, which are then returned as the sensor data. An example of the YAML required is shown below. Again, name, period and type have the same meanings as for the other sensor types. The uri key configures the endpoint to access with optional username and password credentials (in future a map of HTTP headers and other features will be added.) If a status of 200 is resturned, the content will be assumed to be a JSON document, and the jsonPath key is used to extract some part of the data as the sensor value. Here we are simply retrieving the value of the sensor field, but it is possible to perform much more sophisticated queries, although this is beyond the scope of this post.

  - type: brooklyn.entity.software.http.HttpRequestSensor
      name: json.sensor
      period: 1m
      targetType: Integer
      jsonPath: "$.counter"
      uri: >

To give a (contrived) example of how dynamic sensors might be used in an application, imagine that you have a cluster of Couchbase nodes that you wish to scale. Unfortunately the current Brooklyn blueprint already exposes all the useful sensor data you might practically need to use when resizing the cluster, so we must turn to imparctical and useless data instead. Imagine you need to resize based on the amount of disk space used, and ignore for the moment that this is not the right way to scale a cluster. The following blueprint is intended to illustrate the ways in which dynamic sensor data can be used as the input to a policy. The following JSON fragment is part of the data returned by a REST call to the /pool/default endpoint which returns cluster details. We are interested in the usedByData entry, showing hard disk space used by data.

    "storageTotals": {
        "hdd": {
            "free": 46188516230, 
            "quotaTotal": 56327458816, 
            "total": 56327458816, 
            "used": 10138942586, 
            "usedByData": 34907796

The blueprint below shows how an AutoScalerPolicy might be configured to use this information. We have created a new couchbase.storageTotals.usedByData sensor, which connects to the cluster REST API enpoint specified by the uri, username and password. This returns the pool information and the jsonPath selector extracts the $.storageTotals.hdd.usedBydata path which is coerced to an integer based on the type configuration. The scaling policy uses a $brooklyn:sensor(...) directive to configure its metric key, this looks up our dynamic sensor which will then be compared to the lower and upper bounds to decide whether to resize the cluster. This pattern can obviously be used in your own blueprints to much more useful effect!

name: Couchbase Policy Example
location: jclouds:softlayer:lon02
- type: brooklyn.entity.nosql.couchbase.CouchbaseCluster
  id: couchbase
  adminUsername: Administrator
  adminPassword: Password
  initialSize: 3
  - bucket: "default"
    bucket-port: 11211
  - type: brooklyn.entity.software.http.HttpRequestSensor
      name: couchbase.storageTotals.usedByData
      targetType: Integer
      period: 1m
      jsonPath: "$.storageTotals.hdd.usedBydata"
      uri: >
      username: Administrator
      password: Password
  - policyType: brooklyn.policy.autoscaling.AutoScalerPolicy
      metric: >
      metricLowerBound: 10000000
      metricUpperBound: 50000000
      minPoolSize: 1
      maxPoolSize: 5

Hopefully these examples have given you an idea of the capabilities available when designing Brooklyn blueprints. The aim of Brooklyn is to simplify the autonomic management of applications in the cloud, so the ability to build blueprints from pre-defined components and then extend them without needing to write code is core. It is never possible to anticipate every piece of information that users might need to use in their business logic while building policies for elasticity, scaling or resilence. When creating a blueprint for an application, you can dynamically add sensors to retrieve information from entities using SSH, JMX or HTTP, and wire those sensors into Brooklyn's policy framework.

More detailed documentation is available, and further information can be found at the main Apache Brooklyn site or on GitHub.


Docker Networking Ecosystem

One of the things that Clocker is designed to accomplish is to make multi-host Docker networking as easy as using single-host Docker. The way we do this currently is by using the Weave project. Weave is very interesting, because it is an entirely user-space software defined network, and is incredibly simple. Because Clocker is trying to make Docker Cloud applications easy to use and orchestrate, we found Weave a great match for us, with just the right feature-set to allow applications with more complex networking requirements to run in a Clocker provisioned environment without modification. Weave fulfils the essential requirements of a sort of Minimum Viable Network for Docker, and hence Clocker, but still leaving room for other projects to provide segmentation, external endpoints and gateways, and deeper integration.

For example Riak uses Erlang and requires epmd (the Erlang port mapper daemon) for clustering, and this uses up to a thousand TCP ports which are controlled by epmd and used simply for inter-node communication. It makes no sense to use the Docker port mapping facilities to expose these ports externally, and there is no reason for any application outside the Riak cluster to have to access them. All a Riak cluster needs to expose is a single port for web client access, and for admin console access. Weave allows Clocker to set up the Riak nodes in containers, each of which is attached to a private LAN, and exposes the Docker port forwarded web client and admin ports only. This was demonstrated at RICON recently, the Running Riak in a Docker Cloud using Apache Brooklyn talk showed a demonstration of a multi-node Riak cluster running in Clocker.

Weave is, in essence, a software ethernet switch. As mentioned earlier, it is entirely user-space, which is very important for ease of use in the cloud. Clocker, and the underlying Brooklyn control plane, is cloud-agnostic, due to our use of the Apache jclouds library. This means we try to design application blueprints in such a way that they will run anywhere. More complex SDN solutions will require drivers loaded into the kernel, and specific images used for particular operating system versions, which requires customising the configuration of each virtual machine on each cloud provider that a blueprint is required to run on. Weave allows us to ignore this and simply start the weave router in a container on each Docker host, Clocker then assigns IP addresses from the link-local address space range to each container it provisions, attaching each one to the same LAN. This is a flat network architecture, and in many ways is not suitable for production use, particularly with multi-tenant deployments where separation of traffic between applications is a concern. A simple type of access control is possible with Clocker and Weave, in the form of iptables firewall rules, since Clocker controls the IP address space used by each application, it can erect firewall rules preventing traffic from crossing application boundaries. So, this simplicity and portability has allowed Clocker to quickly demonstrate the feasibility of multi-host Docker deployments in the cloud.

But, in the future, Clocker users will want to deploy more complex applications with networking requirements that are not achievable with the Weave network model. To make this transition as seamless as possible, we need a way of supporting more traditional SDN services, such as OVS. There has been a lot of discussion in the Docker community about ways of achieving this, and how to ensure that the chosen solution fits The Docker Way and maintains the flexibility and openness that has been a hallmark of Docker from the start. The proposals in issues #8951 and #8997 discuss adding a new networking API which will allow plugins for various different SDN solutions to be integrated with Docker. In particular, issue #8997 seems to offer the flexibility and choice that Clocker would need, allowing a simple out-of-box experience with Weave, leading to more complex production deployments with OVS or other SDN solutions, when the environment is under the complete control of the blueprint.

The Docker ecosystem is an amazing collection of projects, Clocker and Weave are just a small part of this. One of the features of this ecosystem is the ability to experiment and innovate quickly - many projects, mine included, are permanently in beta, and new features are constantly being added. I hope that the networking and orchestration APIs being added to Docker will maintain this policy and continue to allow small projects to fill their specialised niches in this space.

Further information on Clocker can be found at the main Clocker site or on GitHub.


New Brooklyn Blueprint Features

This post describes a new feature added to Apache Brooklyn in a recent pull request. The feature was based in part on user requirements and partly to support some work being done with Clocker placement strategies. Previously, to create an object for use by a Brooklyn Entity there had to be a specific piece of syntax in the DSL, with associated code to instantiate that type of object or the blueprint had to be written in Java. There was a mix of complex type coercions from String to various classes, such as the ProxySslConfig class for Nginx, which were not extensible or re-useable, and DSL methods like $brooklyn:entity("id") to reference specific types of object.

Blueprint Object Method

The code in pull request #182 adds another DSL method that can create and inject any object required into a Brooklyn blueprint. This means any Java DTO, whether defined as part of the Brooklyn code or externally in some library and passed on to a running entity. Additionally it extends the ConfigKey typed configuration mechanism to allow arbitrary objects to use the same configuration definition and documentation classes as entities, policies, enrichers and locations. This makes Brooklyn much more consistent and blueprints much easier to create and much more powerful as well!

The new DSL method is named object and is invoked using the $brooklyn: prefix as $brooklyn:object: followed by a map of arguments. It can be used anywhere this syntax is valid, including in its own argument map, and it is possible to create nested trees of objects in this way. The method recognises three keys in the map of arguments it is passed: the object type, and either a map of fields or a map of configuration keys. Any left over and unmatched keys here, or in the field and configuration maps will be ignored.

The following example shows how the DSL method could be used to instantiate a brooklyn.example.TestObject object, which is to be configured using Brooklyn ConfigKeys. Note that one of the keys, connectionProperties, is itself an object definition, this time of a com.example.ConnectionProperties object, which could be part of the configuration for a database client. Here, the hostname field of this object is being set using the attributeWhenReady method on a referenced entity, to resolve the host.name sensor value. This shows the complex assemblages of objects and configuration that can be created using the Brooklyn CAMP blueprint DSL.

        objectType: com.example.ConnectionProperties
          username: "guest"
          password: "gu3st"
          port: 31337
          hostname: $brooklyn:component("database").attributeWhenReady("host.name")
    - "http://example.org/test1.war"
    - "http://example.org/test2.war"

Object Configuration

This section documents the configuration used to create an object with the Brooklyn CAMP DSL, describing each of the required values.

First the class name used to instantiate the object must be set. This can be done using any of the keys type, objectType or object_type. The value should be a string with the full package and class name of the required object. Brooklyn will use its default classpath for resolution of the class name, looking in the lib/brooklyn, lib/patch and lib/dropins directories. Extra catalog classpath entries, versioning and OSGi will be supported at a later date. Any class on the classpath can be used, as long as it has a public zero-argument constructor.

The object.fields key is used to specify a map of fields to set on the new object. These are processed using the Apache Commons BeanUtils project, and should consist of keys representing the field name (which should be accessible via a public setter using the JavaBean conventions) and an appropriate value to set. As noted earlier, the value could be another object construction, any supported YAML primitive value or another DSL method such as an entity or sensor reference, or a deferred attribute access. The deferred accessors will be created as Tasks and resolved at runtime as the blueprint application is started.

Finally, brooklyn.config can also be specified with another map of data. This is used in the same way as an entitySpec definition, and will attempt to set fields annotated with @SetFromFlag using reflection. Any keys matching ConfigKeys defined on the object will be set, using the name of the ConfigKey or any @SetFromFlag annotation on them. This assumes that the object implements the Configurable interface, and will be ignored if it does not, as they will cause setConfig(key, value) to be called for each matched map entry. To make this easy to use, a new BasicConfigurableObject class is available, which can be extended and appropriate ConfigKeys added, which will be accessible through its getConfig(key) method. Again, any DSL method or deferred attribute access can be set as a value for a key, and will be parsed and resolved appropriately. Another feature, not available with the object.fields map, is type coercion using the TypeCoercions.coerce(object, class) method, which will be called when setting or accessing ConfigKey values.

If the ManagementContextInjectable interface is implemented by the class specified for the object, the DSL will also detect this, and call the injectManagementContext(context) method after construction. Any other object initialisation is left to the user, although possible use cases such as auto detection of init() methods or annotation driven construction are future possibilities.


Here is another example of a YAML blueprint, this time describing a Clocker application configuration. We are using the new object creation methods to set up some placement strategies. Lines 6-13 are highlighted, and show how both the object.fields and brooklyn.config keys can be used together to give more flexibility configuring an object.

- serviceType: brooklyn.entity.container.docker.DockerInfrastructure
  id: infrastructure
    - $brooklyn:object:
        type: brooklyn.location.docker.strategy.CpuUsagePlacementStrategy
          maxContainers: 6
          maxCpu: 0.33
          infrastructure: $brooklyn:component("infrastructure")
          cpuUsage.sensor: $brooklyn:sensor("machine.cpu.usage")
    - $brooklyn:object:
        type: brooklyn.location.docker.strategy.BreadthFirstPlacementStrategy
          maxContainers: 6
          infrastructure: $brooklyn:component("infrastructure")
    - $brooklyn:object:
        type: brooklyn.location.docker.strategy.AffinityStrategy
          infrastructure: $brooklyn:component("infrastructure")
          iso3166.code: "en-UK"

Other places where the $brooklyn:object DSL notation will be useful are when configuring Apache jclouds location customizer classes:

ConfigKey<Collection<JcloudsLocationCustomizer>> JCLOUDS_LOCATION_CUSTOMIZERS =
        new TypeToken<Collection<JcloudsLocationCustomizer>>() {},
        "customizers", "Optional location customizers");

Previously blueprints using YAML were restricted to using pre-defined classes with fixed configuration, now it is possible to define a JcloudsLocationCustomizer that is dynamically configurable per blueprint. The placement strategy example given earlier also shows how much more flexible this approach is.

Map Configuration Alternative

Entities that use MapConfigKey<Object> to store a map of configuration data, where some values are references to other DSL objects, can now use the more obvious ConfigKey<Map<String,Object>> instead, because resolution of DSL objects now occurs on any collection typed ConfigKey. However, a map of data will discard all type information, and prevent you from taking advantage of some of the more interesting and powerful Brooklyn type coercions. Also, all maps are rendered the same way in the console, which may not be very useful or friendly. Using the $brooklyn:object syntax and defining a POJO data holder object can give strong typing for individual data items, and will allow a custom RendererHint to be provided for the specific Java type you define, which can display more useful information in the console UI.

More detailed documentation is available, and further information can be found at the main Apache Brooklyn site or on GitHub.


Scottish Independence and Startups

I was discussing some issues that Scottish independence may have for high tech startups recently, and two key points were raised that are relevant, and I felt required answering. Scottish technology companies currently rely heavily on various tax schemes from the UK HMRC and their continuity or otherwise under an independent Scotland will have a huge impact on small startups.

The result of these decisions will affect everyone working in Scotland's technology sector, including my employer and the many other companies in Codebase, which is the largest technology incubator in Scotland. I think it is essential to have all of the facts available to help shape any decision making process for the eighteenth. Accordingly, I have tried to determine what the proposed status of these schemes would be in an independent Scotland, as well as how long these schemes might be offered for before being changed.

In the UK, these schemes have been running for some time; EIS from 1994, SEIS from 2012 and R&D tax credits from 2000. The exact details are assessed and adjusted each year in the UK Budget by the Chancellor, and although they could be withdrawn at any time so far the changes have been positive.

Enterprise Investment Scheme

The Enterprise Investment Scheme or EIS is designed to help smaller, higher-risk trading companies to raise finance by offering a range of tax reliefs to investors who purchase new shares in those companies. Many venture capital firms and funds or investment companies rely on this as an integral part of their business, such as Par Equity and their Par Syndicate EIS Fund.

The Seed Enterprise Investment Scheme (SEIS) scheme complements EIS and is designed to help small, early-stage companies to raise equity finance by offering a range of tax reliefs to individual investors who purchase new shares in those companies. SEIS is intended to recognise the particular difficulties which very early stage companies face in attracting investment, by offering tax relief at a higher rate than that offered by the existing EIS. The rules have been designed to mirror those of EIS as it is anticipated that companies may want to go on to use EIS after an initial investment under SEIS.

The Scotland's Future document has a Q&A section with a series of questions relating to Entrepreneurs and the Self Employed. This mentions Seed EIS status after independence in question 68 which states that it will continue. The answers to questions 69 and 70 on venture capital and research commercialisation are also relevant.

Would the UK Seed Enterprise Investment Scheme continue?
Yes. The current Scottish Government proposes that this scheme will continue on independence. Future decisions on the Seed Enterprise Investment Scheme will be made by the Government of an independent Scotland.

Note however, that this is not EIS, which is the major tax break for high net worth individuals, such as angel investors. There is also no information about how long SEIS is guaranteed to be available for in Scotland.

It should be noted that the generous tax relief given to individuals who invest in small early stage businesses via EIS and SEIS is for UK income and capital gains taxes and in an independent Scotland the equivalent schemes would apply to Scottish taxes only which immediately reduces the investor base to that of Scottish high net worth individuals.

Research and Development Tax Credit

Research and Development TaxCredit is a mechanism allowing companies to reduce their tax bill or claim payable cash credits based on R&D expenditure. Any company carrying out R&D is likely to qualify and it can be an integral part of the finances of many startups, which are often struggling for cash, and any technology company that is doing innovative work should be claiming this.

In this article from an accounting website, John Swinney MSP is quoted as saying that the Scottish government will extend and focus research and development tax relief in the event of a Yes vote.

Note that there is no information about how long the R&D tax relief is guaranteed to be available for in Scotland.


Brooklyn and Node.JS Applications

This post shows how Apache Brooklyn can be used to deploy and manage Node.JS applications. This feature was requested in JIRA issue BROOKLYN-12 and implemented in pull request #94.

We are going to create a simple Todo List application, using the code hosted on GitHub at nodejs-todo. This application uses a Redis store to keep track of the list items, and the Express JavaScript framework as the front-end. The original application was written by Amir Rajan, and is part of his Node.JS by Example samples.

So, to deploy this application in Brooklyn, we need to create a Node.JS service, running the code from the GitHub repository. This will need to be configured with the required dependencies, which should be installed by npm. Since the application uses Redis, we will also need a RedisStore node, and some way of telling the app to connect to it. Fortunately, the application has been written to use Heroku's Redis To Go add-on. We just set the REDISTOGO_URL environment variable to the URL of our store, using the host and port sensors from the entity. Due to changes in the latest Express release I have had to modify the application code in my fork because of deprecated dependencies. This also affects the list of dependencies to be installed by npm, and these are listed in the nodePackages configuration.

id: nodejs-todo-application
name: "Node.JS Todo Application"
origin: "https://github.com/amirrajan/nodejs-todo/"
- jclouds:softlayer:ams01
- serviceType: brooklyn.entity.nosql.redis.RedisStore
  id: redis
  name: "Redis"
- serviceType: brooklyn.entity.webapp.nodejs.NodeJsWebAppService
  id: nodejs
  name: "Node.JS"
    appFileName: server.js
    appName: nodejs-todo
    - express
    - ejs
    - jasmine-node
    - underscore
    - method-override
    - cookie-parser
    - express-session
    - body-parser
    - cookie-session
    - redis
    - redis-url
    - connect
    launch.latch: $brooklyn:component("redis").attributeWhenReady("service.isUp")

This blueprint file is available in the Brooklyn repository, in the examples/simple-web-cluster project as nodejs-todo.yaml. As you can see, it defines two services - RedisStore and NodeJsWebAppService. The Node.JS service is configured with the gitRepoUrl pointing to the forked repository with the application code, and the appFileName is used to set the filename passed to the node command to start the application. The env configuration sets the location of the store. This is done using attributeWhenReady calls to retrieve the required sensor values from the Redis entity when they are available. Finally, we use launch.latch to make the Node.JS service wait until Redis has started before actually launching the application.

To deploy the application, just paste the YAML blueprint into the YAML tab of the Create Application window in the Brooklyn console UI. We use Softlayer as the target location in this example, but you can use any configured location, as described in the Getting Started guide. Once the machines have been provisioned and the entities have started up, you should see something similar to the screenshot above. To access the application itself, open the link given in the webapp.url sensor. You should see a Todo Redis page, like the screenshot below.

To deploy your own Node.JS applications, use the YAML blueprint as a guide, and modify the configuration as required. You can also view the source code for the NodeJsWebAppService entity, to see the rest of the available configuration keys. More detailed documentation is available, and further information can be found at the main Brooklyn site or on GitHub.

ADDENDUM These YAML blueprints will also run on Clocker giving you a simple way of containerizing your application. If you modify the locations list to point at the name of an already running Docker Cloud, for example my-docker-cloud then the Node.JS and Redis services will each start in their own Docker containers. The REDISTOGO_URL will need to be changed to use the forwarded Docker port instead, so replace the configuration for the env entry with the following snipped.


To find out more about Clocker, see my earlier post Implementing a Docker Cloud with Apache Brooklyn here, or Creating a Docker Cloud with Apache Brooklyn at Cloudsoft. There is also an excellent Getting Started video from Andrea.


Announcing Clocker 0.5.0

We have just released another update to the Clocker project, bringing it to version 0.5.0. Clocker is a set of entities and locations for Apache Brooklyn that simplifies deployment and management of Docker in the Cloud.

Clocker 0.5.0 incorporates fixes and updates added in response to user testing and feedback, as well as several interesting new features, including:

  • Improved support for more clouds and operating systems.
  • Affinity rules for placement of containers based on currently deployed entities.
  • Support for Docker volume mapping and export.
  • Container CPU and memory configuration.

These are made possible using the latest Docker driver, written by Andrea Turli and now part of the jclouds-labs 1.8.0-SNAPSHOT codebase. The Clocker code is available on GitHub and a binary archive of the release can be downloaded here:


I will also be publishing a tutorial showing how to build a Brooklyn blueprint that takes advantage of these features. This will be a Solr indexing application with a web front-end running on Clocker provisioned containers, with affinity rules and Brooklyn policies to manage placement and scaling. Look out for this in the next few days. In the meantime, there is a slide deck available from the talk that I gave at the Edinburgh Docker Meetup last week, which can be downloaded below.

You can provide feedback with any comments or problems on the incubator-brooklyn-dev mailing list, or add issues to the GitHub repository. If you have an idea for an improvement or a new feature, just fork the code and issue a pull request!


Clocker - Implementing a Docker Cloud with Apache Brooklyn

This is technical and implementation background on the Clocker project, with more behind-the-scenes detail that couldn't be included in the Cloudsoft blog post Creating a Docker Cloud with Apache Brooklyn.


This post will show how Apache Brooklyn and Docker have been combined to create Clocker, a Docker cloud assembled from multiple Docker containers across multiple Docker hosts running on cloud virtual machines. Deploying and managing applications will include intelligent placement of containers, and automatic provisioning of new Docker hosts and containers as required.

Docker is an open platform for distributed applications for developers and sysadmins, and Apache Brooklyn is is an application deployment and runtime management platform that provides blueprints, or declarative specifications for your application components and their configurations.

Docker, Brooklyn, jclouds

The underlying mechanism used by Brooklyn for cloud VM provisioning and management is Apache Jclouds, a cloud API agnostic library that handles communication with both public and private providers to provision virtual machines and expose them as SSH accessible connections. This allows Brooklyn to execute commands on these machines over their SSH connections, as well as transferring files, in a secure fashion. Issues such as login credentials and keys are handled by jclouds and are fully configurable by Brooklyn.

The Docker architecture provides the end user with containers on a virtual machine that, assuming an SSH server is available, can be seen as simply more VMs. To make this fully transparent, a driver was developed for jclouds that allows provisioning of containers on a particular host with Docker installed, using the same API as any other cloud. This is fully described in AMP for Docker and is the foundation for the Docker cloud architecture.


Although it is certainly useful to have the ability to provision multiple containers on a single machine, this will eventually run into limits based on the physical machine being used. Even a large server with many GB (or even TB) of memory and many cores will eventually be unable to handle further containers. As an example, a 128GB 16 core server can be partitioned into only 32 virtual machines with a notional 0.5 core of CPU and 4GB memory, but with containers these limits are shared and each can potentially burst up to the maximum available on the underlying host machine.

Therefore, to increase the capacity we introduced the idea of a cluster of Docker hosts, each with a set of containers, but presented as a single Docker infrastructure encompassing all provisioned containers.

The three parts of this architecture are the Docker entities, which is the tree of DockerInfrastructure, DockerHost and DockerContainer. These are Brooklyn entities that represent the Docker components, and provide sensors and effectors to monitor and control them, such as the current CPU or memory usage or uptime of a host, or the ability to pause and restart a container. The middle portion is a series of jclouds locations, which are Brooklyn wrappers around the main cloud provisioner of virtual machines for hosts, the host virtual machines themselves, the jclouds-docker provisioner accessing the Docker API on a host, and finally the SSHable Docker container as provided by the jclouds-docker driver.

This docker infrastructure will contain the intelligence and logic required to create new Docker hosts and containers as required for a particular application. To make operation as simple as possible, there should not need to be any special configuration provided in a Brooklyn blueprint being deployed to make use of the Docker facilities. To this end, we will use existing Brooklyn APIs and interfaces to present the containers as normal SSH accessible virtual machines, and control the Docker host installation using Brooklyn entities.


The Docker infrastructure is represented by a Brooklyn entity tree. The parent entity contains a DynamicCluster of DockerHost entities. These hosts use virtual machines provisioned by a jclouds location, such as an Amazon EC2 or SoftLayer provider. Each host then has the Docker software installed and a basic Docker image created, with an SSH server available. The host entities themselves contain another DynamicCluster this time containing DockerContainer entities. These are the representations of the provisioned containers for deployed blueprint entities.

The Docker infrastructure entity is used to create a DockerLocation which is a Brooklyn location that can provision virtual machines, in this case Docker containers. This is done by wrapping the parent jclouds provisioner to obtain new virtual machines in the background for the Docker hosts, and then creating a new jclouds location this time using the Docker API driver, for each host. When this jclouds-docker driver provisions a new Docker container, that is presented as a DockerContainerLocation that exposes the SSH connection to the underlying container. Brooklyn poses the forwarded Docker port numbers from the container via a SubnetTier entity, allowing access as though the container was available on the same public IP as the Docker host.

This sequence of wrapped entities and locations is visible to an end user deploying a blueprint as a new named location in the Brooklyn console, configured during the deployment of the DockerInfrastructure entity. The default name is my-docker-cloud and blueprints deployed to this location will have their entities started in Docker containers spread across the Docker cloud.


This Docker cloud allows us much more flexibility in deploying a containerised application than is possible with a single host. In particular, we now have a choice of which Docker host to use when adding a new container. This flexibility is accomplished through the use of a NodePlacementStrategy configured either on a per blueprint basis, or as a default for the whole Docker infrastructure.

The simplest choices to make are to fill the Docker hosts to some maximum number of containers, and then provision a new host. This is a depth-first strategy, and will make the most efficient use of available infrastructure. If performance is more of a concern, then a breadth-first strategy can be used. This will attempt to balance container creation across all available hosts, starting at the least populated. This takes no account of actual host usage however, merely choosing on the basis of number of containers, and may not be sufficient when deployed entities have a wide range of CPU requirements. To make most efficient use of available CPU resources a strategy that checks the current CPU usage and creates the container in the host with the lowest percentage can be used.

The currently available strategies are summarised below, and more are in development:

Although we are striving to make the deployment process for a Brooklyn blueprint as transparent as possible, it should be obvious that some capabilities require explicit configuration to control the Docker infrastructure, such as the placement strategies outlined above. Another feature available is the ability to select a particular Docker image for a container, whether by image ID directly or by specifying a Dockerfile to use. The default Dockerfile simply starts an SSH daemon, and an image is created using this for each Docker host.

If a configuration key is added to an entity specifying either a Dockerfile or image ID, this will be intercepted by the DockerHostLocation at the time the container is created. Dockerfiles are specified using a URL that can be found either on the classpath of the Brooklyn server or remotely over HTTP or other supported protocol. Although the infrastructure does not fully support the Docker image registries at present, when an image is created from a Dockerfile, it is made using a unique hash of the Dockerfile URL, allowing the image to be reused later when further entities are requested using that same Dockerfile. This gives a considerable speed-up to clustered deployments of many identical entities on the same host. The configuration is, of course, ignored when the blueprint is deployed to a non-Docker cloud location.

Additionally, Dockerfiles are used by Brooklyn as templates rather than immutable text files. This means that they can contain variables and placeholders that are interpolated using FreeMarker based on the entity being deployed. This capability allows a single Dockerfile to be used to control deployment of a wide variety of different applications, and gives Brooklyn scope to integrate more closely with the deployed Docker infrastructure.

Deploying a Docker Cloud

The following YAML blueprint shows the configuration of a Docker cloud deployed to the Amazon EC2 Europe region.

id: docker-cloud
name: "Docker Cloud Infrastructure"
origin: "https://github.com/brooklyncentral/clocker/"
- jclouds:aws-ec2:eu-west-1:
    hardwareId: m1.small
- serviceType: brooklyn.entity.container.docker.DockerInfrastructure
  id: infrastructure
    docker.version: "1.0"
    entity.dynamicLocation.name: "my-docker-cloud"
    docker.host.securityGroup: "docker"
    docker.host.initial.size: 2
    docker.container.cluster.maxSize: 4
    docker.host.register: false
        type: brooklyn.entity.container.docker.DockerHost
          start.timeout: 900
          docker.host.nameFormat: "docker-%1$s"
              type: brooklyn.entity.container.docker.DockerContainer
                docker.container.nameFormat: "docker-%2$d"

Once deployed, this will provision two virtual machines in the EC2 Dublin region, and install Docker 1.0 on each. These hosts are monitored for CPU usage and memory usage, which are made available as sensor data for Brooklyn policies. For a more complete tutorial including application deployment to a Docker cloud, see the Cloudsoft Creating a Docker Cloud with Apache Brooklyn post.


This is the first release of the Brooklyn integration with Docker and development is ongoing. Some of the features to look out for in the next releases will be:

  • More placement strategies and policies for container management.
  • Implementation of affinity and anti-affinity APIs and a simple DSL to control container placement based on deployed entities and applications, and allowing blueprints to specify their affinity preferences on a per-entity basis.
  • Improvements in the jclouds-docker driver will allow more control over Docker container provisioning, such as CPU shares and memory allocation. Docker image repository integration will allow images to be pulled from centrally configured locations, and shared between different hosts in a Docker cloud.
  • Adding integration with software-defined networking (SDN) services such as Open vSwitch or OpenContrail will allow isolation and control of container network traffic, as well as easier communication between containers on different hosts, by introducing a shared Docker VLAN.
  • Seamless integration of all existing Brooklyn blueprints with the Docker infrastructure.


The Docker integration with Brooklyn allows application blueprints to be deployed to a cloud of Docker containers, spread out intelligently across Docker hosts on managed VMs in the cloud.

An application blueprint is able to treat the Docker infrastructure as though it is a homogenous set of virtual machines in a single location, and then Docker infrastructure can make intelligent decisions about container provisioning and placement to maximise resource usage and performance.



The code is available on GitHub under the Apache 2.0 license, so please fork and contribute by creating pull requests or opening issues.

Apache Brooklyn is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by Chip Childers. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.


Brooklyn ConfigKey and EntitySpec Interaction

Brooklyn ConfigKey Issue

I have been seeing some very strange behaviour in a Brooklyn application I am writing. Basically it looked like ConfigKeys were being overwritten, or the wrong values were being used. I have two clusters, which have their memberSpec key configured like this, in the init() method of their parent entity:

EntitySpec spec = getConfig(SPEC)
        .configure(PARENT, this);
Cluster cluster = addChild(EntitySpec.create(DynamicCluster.class)
        .configure(Cluster.INITIAL_SIZE, initialSize)
        .configure(DynamicCluster.MEMBER_SPEC, spec);

The intent is to have PARENT set to ParentOne for ClusterOne and ParentTwo for ClusterTwo. But, when I resize the clusters and look at the details of their child entities in the Config tab of the Brooklyn console, the information is as follows:

ClusterOne member EntityOne has PARENT set to ParentOne
ClusterOne member EntityTwo has PARENT set to ParentTwo
ClusterTwo member EntityThree has PARENT set to ParentTwo

Note line two, the PARENT ConfigKey should have the same value as the other entity in ClusterOne since they are both derived from the same specification. Instead, it has the same value as the entity in ClusterTwo which was created with a completely different specification!

Correct EntitySpec Usage

It seems that this is happening because I am modifying a shared EntitySpec object in the configure(PARENT, this) call, so it is overwriting the previous value of the PARENT ConfigKey with the mutated EntitySpec in the Cluster. This is why the second member of ClusterOne is wrong, as the resize(2) operation on it happens after ClusterTwo is created and the referenced spec object is modified.

So, the solution is to change the first line of the block above to this:

EntitySpec spec = EntitySpec.create(getConfig(SPEC))
        .configure(PARENT, this);

Note the added call to EntitySpec.create() wrapping the SPEC configuration value. This makes a copy of the provided EntitySpec before configuring that with the PARENT key. The spec object can now be used during the addChild method call.

Anyway, just to point out that this took longer than I thought to debug, so I wanted to share in case other people come across a similar issue.