2014-11-15

Custom Sensors in Brooklyn

One of the features of Apache Brooklyn is the ability to retrieve data from a running entity using sensors. These sensors expose data from the entity as attributes containing information like queue depth for a message broker, latency for an HTTP server or CPU usage for a Docker host. The data can also be enriched or aggregated to produce sums or moving averages across multiple entities in a cluster, and used as input to policies to drive scaling and resilience mechanisms. The sources for these sensors are varied, and encompass JMX attributes for Java applications, fields from XML or JSON documents returned by RESTful APIs, parsed output from shell commands and many more.

Sometimes the entities provided by Brooklyn do not have the particular piece of data you need for a policy exposed as a sensor. In these circumstances it is possible to dynamically add sensors, either programmatically in a Java entity class that extends the default Brooklyn code, or in the YAML blueprint used to describe the application. There are three different categories of sensor that can be added, SSH command output, JSON data from an HTTP URL or JMX attributes from a Java entity. each of these are configured differently, although there is some commonality.

First, we will look at an example using the SshCommandSensor class to add sensors driven by the output of shell commands invoked over SSH to the virtual machine running the software process being managed. The following blueprint shows a TomcatServer entity and a brooklyn.initializers section adding a sensor to it.

name: Tomcat SSH Sensor
services:
- serviceType: brooklyn.entity.webapp.tomcat.TomcatServer
  name: Tomcat
  location: jclouds:aws-ec2:eu-west-1
  brooklyn.initializers:
  - type: brooklyn.entity.software.ssh.SshCommandSensor
    brooklyn.config:
      name: tomcat.cpustats
      command: "mpstat | grep all"
  brooklyn.config:
    pre.install.command: "sudo apt-get install -y sysstat"

The output of this sensor can be seen in the following screenshot. The SSH command mpstat | grep all is being executed to generate information on CPU usage, which is the published as the tomcat.cpustats sensor.

Problems using SSH

However when trying to perform some calculations on the data from mpstat, using the following YAML fragment to add another sensor, my colleague Richard discovered that the code did not perform as expected. Although the SSH commands appeared to be being executed as expected, the sensor data was always empty.

  - type: brooklyn.entity.software.ssh.SshCommandSensor
    brooklyn.config:
      name: tomcat.cpustats.broken
      command: >
        mpstat |
        awk '$2==\"all\" { print $3+$4+$5+$6+$7+$8+$9+$10 }'

After some investigation, he located the cause of this problem, which is obscure enough that I have decided to document it here for future reference. I will quote from Richard's email on the subject:

The problem is that the LANG environment variable is different between me running SSH in a terminal to test out potential commands, and when Brooklyn is opening SSH sessions to run its commands.

When I was experimenting with finding the right command to run, I would SSH to a Linux box and run variations on my command until I came up with a working version. This SSH session would inherit my workstation's LANG of en_GB.UTF-8. When I ran mpstat, here is a typical line of output:

16:13:15    all   5.40   0.05   2.18   0.52   0.09   0.03   0.00   0.00  91.74

Having got a working command, I plugged it into my blueprint and let Brooklyn run the command. Unfortunately, Brooklyn (probably) does not set the LANG environment variable, and this particular Linux machine chose a default of en_US.UTF-8. When it ran mpstat, here is the equivalent line of output:

04:13:11 PM all   5.40   0.05   2.18   0.52   0.09   0.03   0.00   0.00  91.74

Notice that the date has changed from 24-hour form, to 12-hour with AM/PM suffix. Also note that there is a space before the "PM" suffix - causing all of my awk field numbers to now be off-by-one. D'oh!

So if the output of a command you intend to use with awk, perl or even just cut includes potentially locale-specific items, explicitly set the LANG variable to prevent any surprises in formatting that will throw off your parsing routines. In this case, modifying the command value in the blueprint to set the locale explicitly is achieved by setting LANG to en_US.UTF-8 in the blueprint.

  brooklyn.initializers:
  - type: brooklyn.entity.software.ssh.SshCommandSensor
    brooklyn.config:
      name: cpu.load
      command: >
        LANG=en_US.UTF-8 mpstat |
        awk '$3==\"all\" {print $4+$5+$6+$7+$8+$9+$10+$11}'

Other examples of SSH sensors might be returning the contents of various files in /proc or executing an administration command for an entity to return information. The sensor can be configured to poll at specific intervals, and the output can be coerced to different types, as required. To chnage the poll frequency, set the period configuration key to the time required, either in milliseconds or using a suffix to indicate minutes or seconds, for example 10m for ten minutes or 5s for five seconds. The sensor type is set using the type configuration key, and can be either the name of a primitive type or a fully qualified Java class name. The following example returns the available disk space as the disk.available integer sensor, every five minutes.

  brooklyn.initializers:
  - type: brooklyn.entity.software.ssh.SshCommandSensor
    brooklyn.config:
      name: disk.available
      command: "df / | grep disk1 | cut -d\  -f4"
      period: 5m
      targetType: Integer

Other Sensor Types

The dynamic sensor addition in Brooklyn is not limited to running SSH commands, and there are currently two other mechanisms available. If you have an entity that is running a Java program with JMX enabled, Brooklyn is able to retrieve attributes and convert them into sensor data. Using the JmxAttributeSensor in the same way as for SSH sensors, we can add a dynamic JMX sensor. For example, this YAML snippet adds the LoadedClassCount attribute from the java.lang:type=ClassLoading JMX object as the loaded.classes sensor, refresing every thirty seconds.

  - type: brooklyn.entity.software.java.JmxAttributeSensor
    brooklyn.config:
      name: loaded.classes
      objectName: "java.lang:type=ClassLoading"
      attribute: "LoadedClassCount"
      targetType: Integer
      period: 30s

This JMX object is part of the JVM management information beans, and should be available on every Java application. To access application specific attributes, Brooklyn must already be able to access JMX data on the entity, which will normally be the case if the entity implements the UsesJmx interface. This means that Brooklyn will be able to determine the JMX and RMI or JMXMP ports that are being used, as well as any authentication details that are required, and these are re-used when adding sensors like this. The same period and type keys are used to configure the sensor polling and return value coercion, just as for SSH sensors.

Finally, it is possible to access and parse JSON data froman HTTP based REST API on an entity. This uses the JSONPath expression language to extract parts of a JSON document, which are then returned as the sensor data. An example of the YAML required is shown below. Again, name, period and type have the same meanings as for the other sensor types. The uri key configures the endpoint to access with optional username and password credentials (in future a map of HTTP headers and other features will be added.) If a status of 200 is resturned, the content will be assumed to be a JSON document, and the jsonPath key is used to extract some part of the data as the sensor value. Here we are simply retrieving the value of the sensor field, but it is possible to perform much more sophisticated queries, although this is beyond the scope of this post.

  - type: brooklyn.entity.software.http.HttpRequestSensor
    brooklyn.config: 
      name: json.sensor
      period: 1m
      targetType: Integer
      jsonPath: "$.counter"
      uri: >
        $brooklyn:formatString("http://%s:%d/info.json",
        component("web").attributeWhenReady("host.name"),
        component("web").attributeWhenReady("http.port"))

To give a (contrived) example of how dynamic sensors might be used in an application, imagine that you have a cluster of Couchbase nodes that you wish to scale. Unfortunately the current Brooklyn blueprint already exposes all the useful sensor data you might practically need to use when resizing the cluster, so we must turn to imparctical and useless data instead. Imagine you need to resize based on the amount of disk space used, and ignore for the moment that this is not the right way to scale a cluster. The following blueprint is intended to illustrate the ways in which dynamic sensor data can be used as the input to a policy. The following JSON fragment is part of the data returned by a REST call to the /pool/default endpoint which returns cluster details. We are interested in the usedByData entry, showing hard disk space used by data.

{
    "storageTotals": {
        "hdd": {
            "free": 46188516230, 
            "quotaTotal": 56327458816, 
            "total": 56327458816, 
            "used": 10138942586, 
            "usedByData": 34907796
        }
    }
}

The blueprint below shows how an AutoScalerPolicy might be configured to use this information. We have created a new couchbase.storageTotals.usedByData sensor, which connects to the cluster REST API enpoint specified by the uri, username and password. This returns the pool information and the jsonPath selector extracts the $.storageTotals.hdd.usedBydata path which is coerced to an integer based on the type configuration. The scaling policy uses a $brooklyn:sensor(...) directive to configure its metric key, this looks up our dynamic sensor which will then be compared to the lower and upper bounds to decide whether to resize the cluster. This pattern can obviously be used in your own blueprints to much more useful effect!

name: Couchbase Policy Example
location: jclouds:softlayer:lon02
services:
- type: brooklyn.entity.nosql.couchbase.CouchbaseCluster
  id: couchbase
  adminUsername: Administrator
  adminPassword: Password
  initialSize: 3
  createBuckets:
  - bucket: "default"
    bucket-port: 11211
  - type: brooklyn.entity.software.http.HttpRequestSensor
    brooklyn.config: 
      name: couchbase.storageTotals.usedByData
      targetType: Integer
      period: 1m
      jsonPath: "$.storageTotals.hdd.usedBydata"
      uri: >
        $brooklyn:formatString("%s/pool/default",
        $brooklyn:entity("couchbase").attributeWhenReady("couchbase.cluster.connection.url"))
      username: Administrator
      password: Password
  brooklyn.policies:
  - policyType: brooklyn.policy.autoscaling.AutoScalerPolicy
    brooklyn.config:
      metric: >
        $brooklyn:sensor("brooklyn.entity.nosql.couchbase.CouchbaseCluster",
        "couchbase.storageTotals.usedByData")
      metricLowerBound: 10000000
      metricUpperBound: 50000000
      minPoolSize: 1
      maxPoolSize: 5

Hopefully these examples have given you an idea of the capabilities available when designing Brooklyn blueprints. The aim of Brooklyn is to simplify the autonomic management of applications in the cloud, so the ability to build blueprints from pre-defined components and then extend them without needing to write code is core. It is never possible to anticipate every piece of information that users might need to use in their business logic while building policies for elasticity, scaling or resilence. When creating a blueprint for an application, you can dynamically add sensors to retrieve information from entities using SSH, JMX or HTTP, and wire those sensors into Brooklyn's policy framework.

More detailed documentation is available, and further information can be found at the main Apache Brooklyn site or on GitHub.

2014-11-07

Docker Networking Ecosystem

One of the things that Clocker is designed to accomplish is to make multi-host Docker networking as easy as using single-host Docker. The way we do this currently is by using the Weave project. Weave is very interesting, because it is an entirely user-space software defined network, and is incredibly simple. Because Clocker is trying to make Docker Cloud applications easy to use and orchestrate, we found Weave a great match for us, with just the right feature-set to allow applications with more complex networking requirements to run in a Clocker provisioned environment without modification. Weave fulfils the essential requirements of a sort of Minimum Viable Network for Docker, and hence Clocker, but still leaving room for other projects to provide segmentation, external endpoints and gateways, and deeper integration.

For example Riak uses Erlang and requires epmd (the Erlang port mapper daemon) for clustering, and this uses up to a thousand TCP ports which are controlled by epmd and used simply for inter-node communication. It makes no sense to use the Docker port mapping facilities to expose these ports externally, and there is no reason for any application outside the Riak cluster to have to access them. All a Riak cluster needs to expose is a single port for web client access, and for admin console access. Weave allows Clocker to set up the Riak nodes in containers, each of which is attached to a private LAN, and exposes the Docker port forwarded web client and admin ports only. This was demonstrated at RICON recently, the Running Riak in a Docker Cloud using Apache Brooklyn talk showed a demonstration of a multi-node Riak cluster running in Clocker.

Weave is, in essence, a software ethernet switch. As mentioned earlier, it is entirely user-space, which is very important for ease of use in the cloud. Clocker, and the underlying Brooklyn control plane, is cloud-agnostic, due to our use of the Apache jclouds library. This means we try to design application blueprints in such a way that they will run anywhere. More complex SDN solutions will require drivers loaded into the kernel, and specific images used for particular operating system versions, which requires customising the configuration of each virtual machine on each cloud provider that a blueprint is required to run on. Weave allows us to ignore this and simply start the weave router in a container on each Docker host, Clocker then assigns IP addresses from the link-local address space range to each container it provisions, attaching each one to the same LAN. This is a flat network architecture, and in many ways is not suitable for production use, particularly with multi-tenant deployments where separation of traffic between applications is a concern. A simple type of access control is possible with Clocker and Weave, in the form of iptables firewall rules, since Clocker controls the IP address space used by each application, it can erect firewall rules preventing traffic from crossing application boundaries. So, this simplicity and portability has allowed Clocker to quickly demonstrate the feasibility of multi-host Docker deployments in the cloud.

But, in the future, Clocker users will want to deploy more complex applications with networking requirements that are not achievable with the Weave network model. To make this transition as seamless as possible, we need a way of supporting more traditional SDN services, such as OVS. There has been a lot of discussion in the Docker community about ways of achieving this, and how to ensure that the chosen solution fits The Docker Way and maintains the flexibility and openness that has been a hallmark of Docker from the start. The proposals in issues #8951 and #8997 discuss adding a new networking API which will allow plugins for various different SDN solutions to be integrated with Docker. In particular, issue #8997 seems to offer the flexibility and choice that Clocker would need, allowing a simple out-of-box experience with Weave, leading to more complex production deployments with OVS or other SDN solutions, when the environment is under the complete control of the blueprint.

The Docker ecosystem is an amazing collection of projects, Clocker and Weave are just a small part of this. One of the features of this ecosystem is the ability to experiment and innovate quickly - many projects, mine included, are permanently in beta, and new features are constantly being added. I hope that the networking and orchestration APIs being added to Docker will maintain this policy and continue to allow small projects to fill their specialised niches in this space.

Further information on Clocker can be found at the main Clocker site or on GitHub.