Once the package is installed, you need to configure td-agent, telling it to match cer‐
tain events and redirect them to a specific location. For example, to match all Docker
events (which by default are tagged with docker.<CONTAINER_ID>) and redirect them
to stdout, edit the td-agent configuration file /etc/td-agent/td-agent.conf and add the
following line:
<match docker.**>
type stdout
</match>
Then restart the service:
$ sudo service td-agent restart
You are now ready to start using Fluentd to manage your Docker logs. Let’s start an
Nginx container and use this logging driver:
$ docker run -d -p 80:80 --name nginx --log-driver=fluentd nginx
Now if you access Nginx in your browser and then check the td-agent log file, you will
see the Docker logs:
$ tail -n 3 /var/log/td-agent/td-agent.log
...
2015-08-17 13:41:10 docker.dc3a645abfaa: {"log":"192.168.33.1 ...,\
"container_id":"dc3a645abfaa...",\
"container_name":"/nginx",\
"source":"stdout"}
You see that the logs are prefixed with docker.<CONTAINER_ID>. If you wanted to pre‐
fix the logs with something else, you could specify a different Go template (currently
{{.ID}}, {{.FullID}}, {{.Name}}). For example, to prefix the logs with the name of
the container, use the log-opt option like so:
$ docker kill nginx
$ docker rm nginx
$ docker run -d -p 80:80 --name nginx \
--log-driver=fluentd \
--log-opt fluentd-tag=docker.{{.Name}} nginx
The logs will become similar to the following:
$ tail -n 3 /var/log/td-agent/td-agent.log
...
2015-08-17 13:43:45 docker./nginx: {"container_id":"e4152ad9bdba...",\
"container_name":"/stupefied_franklin",\
"source":"stdout",\
"log":"192.168.33.1 ...}
In this example, you redirected the logs to only the Fluentd logs themselves. This is
not extremely useful or practical. In a production deployment, you would redirect the
logs to a remote data store like elasticsearch, influxdb, or mongoDB, for example.
9.5 Using a Different Logging Driver than the Docker Daemon | 281
Discussion
In the solution section, you ran td-agent as a local service on the Docker host. You
could also run it in a local container. Let’s write a configuration file in your working
directory called test.conf that contains the following:
<source>
type forward
</source>
<match docker.**>
type stdout
</match>
Then let’s start a fluentd container. You specify a volume mount to put your configu‐
ration file in the running container and specify an environment variable that points
to this file:
$ docker run -it -d -p 24224:24224 -v /path/to/conf:/fluentd/etc \
-e FLUENTD_CONF=test.conf fluent/fluentd:latest
By default, the fluentd logging driver tries to reach a fluentd server on localhost at
port 24224. Therefore, if you run another container with the --log-driver=fluentd
option, it will automatically reach fluentd running in the container.
Now start an Nginx container as you did earlier and watch the logs on the Fluentd
container with docker logs.
See Also
• Configuring logging drivers
• Fluentd logging driver for Docker documentation
9.6 Using Logspout to Collect Container Logs
Problem
Container logs can be obtained from docker logs, as seen in Recipe 9.4, but you
would like to collect these logs from containers running in multiple Docker hosts and
aggregate them.
Solution
Use logspout. Logspout can collect logs from all containers running on a host and
route them to another host. It runs as a container and is purely stateless. You can use
it to route logs to a syslog server or send it to Logstash for processing. Logspout was
created prior to the release of Docker 1.6, which introduced the logging driver (see
282 | Chapter 9: Monitoring Containers
Recipe 9.5) functionality. You can still use Logspout, but the logging driver also gives
you a straightforward way to redirect your logs.
Let’s install Logspout on one Docker host to collect logs from an Nginx container.
You run nginx on port 80 of the host. Start logspout, mount the Docker Unix
socket /var/run/docker.sock in /tmp/docker.sock, and specify a syslog endpoint (here
you use another Docker host with the IP address of 192.168.34.11):
$ docker pull nginx
$ docker pull gliderlabs/logspout
$ docker run -d --name webserver -p 80:80 nginx
$ docker run -d --name logspout -v /var/run/docker.sock:/tmp/docker.sock \
gliderlabs/logspout syslog://192.168.34.11:5000
To collect the logs, you’ll use a Logstash container running at 192.168.34.11. To sim‐
plify things, it will listen for syslog input on UDP port 5000 and output everything to
stdout on the same host. Start by pulling the logstash image. (This example uses the
image ehazlett/logstash, but there are many Logstash images that you might want to
consider.) After pulling the image, you’ll build your own and specify a custom Log‐
stash configuration file (this is based on the /etc/logstash.conf.sample from the eha‐
zlett/logstash image):
$ docker pull ehazlett/logstash
$ cat logstash.conf
input {
tcp {
port => 5000
type => syslog
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} \
%{SYSLOGHOST:syslog_hostname} \
%{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: \
%{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
stdout { codec => rubydebug }
}
9.6 Using Logspout to Collect Container Logs | 283
$ cat Dockerfile
FROM ehazlett/logstash
COPY logstash.conf /etc/logstash.conf
ENTRYPOINT ["/opt/logstash/bin/logstash"]
$ docker build -t logstash .
You are now ready to run the Logstash container, and bind port 5000 of the container
to port 5000 of the host listening for UDP traffic:
$ docker run -d --name logstash -p 5000:5000/udp logstash -f /etc/logstash.conf
Once you open your browser to access Nginx running on the first Docker host you
used, logs will appear in the Logstash container:
$ docker logs logstash
...
{
"message" => "<14>2015-03-10T13:00:39Z 889bbf0753a8 nginx[1]: 192.168.34.1 - \
- [10/Mar/2015:13:00:39 +0000] \"GET / HTTP/1.1\" 200 612 \"-\"
\"Mozilla/5.0 \
(Macintosh; Intel Mac OS X 10_8_5) \
AppleWebKit/600.3.18 (KHTML, like Gecko) \
Version/6.2.3 Safari/537.85.12\" \"-\"\n",
"@version" => "1",
"@timestamp" => "2015-03-10T13:00:36.241Z",
"type" => "syslog",
"host" => "192.168.34.10",
"tags" => [
...
Discussion
To simplify testing Logspout with Logstash, you can clone the repository accompany‐
ing this book and go to the ch09/logspout directory. A Vagrantfile will start two
Docker hosts and pull the required Docker images on each host:
$ git clone https://github.com/how2dock/docbook.git
$ vagrant up
$ vagrant status
Current machine states:
w running (virtualbox)
elk running (virtualbox)
...
On the web server node, you can run Nginx and the Logspout container. On the elk
node, you can run the Logstash container:
$ vagrant ssh w
$ docker run --name nginx -d -p 80:80 nginx
$ docker run -d --name logspout -v /var/run/docker.sock:/tmp/docker.sock \
gliderlabs/logspout syslog://192.168.34.11:5000
284 | Chapter 9: Monitoring Containers
$ vagrant ssh elk
$ cd /vagrant
$ docker build -t logstash .
$ docker run -d --name logstash -p 5000:5000/udp logstash -f /etc/logstash.conf
You should see your Nginx logs in the Logstash container. Experiment with more
hosts and different containers, and play with the Logstash plug-ins to store your logs
in different formats.
See Also
• Logstash website
• Configuration of Logstash
• Plug-ins for Logstash inputs, outputs, codecs, and filters
9.7 Managing Logspout Routes to Store Container Logs
Problem
You are using Logspout to stream your logs to a remote server, but you would like to
modify this endpoint. Specifically, you want to debug your containers by looking
directly at Logspout, change the endpoint it uses, or add more endpoints.
Solution
In Recipe 9.6, you might have noticed that the Logspout container has port 8000
exposed. You can use this port to manage routes via a straightfoward HTTP API.
You can bind port 8000 to the host to access this API remotely, but as an exercise you
are going to use a linked container to do it locally. Pull an image that contains curl
and start a container interactively. Verify that you can ping the Logspout container
(here I assume that you have the same setup as in Recipe 9.6). Then use curl to
access the Logspout API at http://logspout:8000.
$ docker pull tutum/curl
$ docker run -ti --link logspout:logspout tutum/curl /bin/bash
root@c94a4eacb7cc:/# ping logspout
PING logspout (172.17.0.10) 56(84) bytes of data.
64 bytes from logspout (172.17.0.10): icmp_seq=1 ttl=64 time=0.075 ms
...
root@c94a4eacb7cc:/# curl http://logspout:8000/logs
logspout|[martini] Started GET /logs for 172.17.0.12:38353
nginx|192.168.34.1 [10/Mar/2015:13:57:38 +0000] "GET / HTTP/1.1" 200 ...
nginx|192.168.34.1 [10/Mar/2015:13:57:43 +0000] "GET / HTTP/1.1" 200 ...
9.7 Managing Logspout Routes to Store Container Logs | 285
Discussion
To manage the log streams, the API exposes a /routes route. The standard HTTP
verbs GET, DELETE, and POST can be used to list, delete, and update the streaming end‐
points, respectively:
root@1fbb2f9636a8:/# curl http://logspout:8000/routes
[
{
"id": "e508de0c9689",
"target": {
"type": "syslog",
"addr": "192.168.34.11:5000"
}
}
]
root@1fbb2f9636a8:/# curl http://logspout:8000/routes/e508de0c9689
{
"id": "e508de0c9689",
"target": {
"type": "syslog",
"addr": "192.168.34.11:5000"
}
}
root@1fbb2f9636a8:/# curl -X DELETE http://logspout:8000/routes/e508de0c9689
root@1fbb2f9636a8:/# curl http://logspout:8000/routes
[]
root@1fbb2f9636a8:/# curl -X POST \
-d '{"target": {"type": "syslog", \
"addr": "192.168.34.11:5000"}}' \
http://logspout:8000/routes
{
"id": "f60d30502654",
"target": {
"type": "syslog",
"addr": "192.168.34.11:5000"
}
}
root@1fbb2f9636a8:/# curl http://logspout:8000/routes
[
{
"id": "f60d30502654",
"target": {
"type": "syslog",
"addr": "192.168.34.11:5000"
}
}
]
286 | Chapter 9: Monitoring Containers
You can create a route to Papertrail that provides automatic backup
to Amazon S3.
9.8 Using Elasticsearch and Kibana to Store and Visualize
Container Logs
Problem
Recipe 9.6 uses Logstash to receive logs and send them to stdout. However, Logstash
has many plug-ins that allow you to do much more. You would like to go further and
use Elasticsearch to store your container logs.
Solution
Start an Elasticsearch and a Kibana container. Kibana is a dashboard that allows you
to easily visualize and query your Elasticsearch indexes. Start a Logstash container by
using the default configuration from the ehazlett/logstash image:
$ docker run --name es -d -p 9200:9200 -p 9300:9300 ehazlett/elasticsearch
$ docker run --name kibana -d -p 80:80 ehazlett/kibana
$ docker run -d --name logstash -p 5000:5000/udp \
--link es:elasticsearch ehazlett/logstash \
-f /etc/logstash.conf.sample
Notice that the Logstash container is linked to the Elasticsearch
container. If you do not link it, Logstash will not be able to find the
Elasticsearch server.
With the container running, you can open your browser on port 80 of the Docker
host where you are running the Kibana container. You will see the Kibana default
dashboard. Select Sample Dashboard to extract some information from your index
and build a basic dashboard. You should see the logs obtained from hitting the Nginx
server, as shown in Figure 9-1.
9.8 Using Elasticsearch and Kibana to Store and Visualize Container Logs | 287
Figure 9-1. Snapshot of a Kibana dashboard obtained with this recipe
Discussion
In the solution, Elasticsearch is running on a single container. The index created
when storing your logs streamed by Logspout will not persist if you kill and remove
the Elasticsearch container. Consider mounting a volume and backing it up to persist
your Elasticsearch data. In addition, if you need more storage and an efficient index,
you should create an Elasticsearch cluster across multiple Docker hosts.
9.9 Using Collectd to Visualize Container Metrics
Problem
In addition to visualizing application logs (see Recipe 9.8), you would like to monitor
container metrics such as CPU.
Solution
Use Collectd. Run it in a container on all hosts where you have running containers
that you want to monitor. By mounting the /var/run/docker.sock socket in a collectd
container, you can use a Collectd plug-in that uses the Docker stats API (see Recipe
9.2) and sends metrics to a Graphite dashboard running in a different host.
This is an advanced recipe that uses several concepts covered ear‐
lier. Make sure to do Recipe 7.1 and Recipe 9.8 before doing this
recipe.
To test this, you’ll use the following setup, with two Docker hosts. One runs four con‐
tainers: an Nginx container used to generate dummy logs to stdout, a Logspout con‐
tainer that will route all stdout logs to a Logstash instance, one that generates a syn‐
thetic load (i.e., borja/unixbench), and one Collectd container. These four containers
can be started using Docker Compose.
288 | Chapter 9: Monitoring Containers
The other host runs four containers as well: a Logstash container to collect the logs
coming from Logspout, an Elasticsearch container to store the logs, a Kibana con‐
tainer to visualize those logs, and a Graphite container. The Graphite container also
runs carbon to store the metrics.
Figure 9-2 illustrates this two-host, eight-container setup.
Figure 9-2. Two-host, Collectd, Logstash, Kibana, Graphite setup
On the first host (the worker), you can start all the containers with Docker Compose
(see Recipe 7.1) using a YAML file like this one:
nginx:
image: nginx
ports:
- 80:80
logspout:
image: gliderlabs/logspout
volumes:
- /var/run/docker.sock:/tmp/docker.sock
command: syslog://192.168.33.11:5000
collectd:
build: .
volumes:
- /var/run/docker.sock:/var/run/docker.sock
load:
image: borja/unixbench
The Logspout container uses a command that specifies your Logstash endpoint.
Change the IP if you are running in a different networking environment. The Col‐
lectd container is built by Docker Compose and based on the following Dockerfile:
9.9 Using Collectd to Visualize Container Metrics | 289
FROM debian:jessie
RUN apt-get update && apt-get -y install \
collectd \
python \
python-pip
RUN apt-get clean
RUN pip install docker-py
RUN groupadd -r docker && useradd -r -g docker docker
ADD docker-stats.py /opt/collectd/bin/docker-stats.py
ADD docker-report.py /opt/collectd/bin/docker-report.py
ADD collectd.conf /etc/collectd/collectd.conf
RUN chown -R docker /opt/collectd/bin
CMD ["/usr/sbin/collectd","-f"]
In the discussion section of this recipe, you will go over the scripts used in this Dock‐
erfile.
On the second host (the monitor), you can start all containers with Docker Compose
(see Recipe 7.1) using a YAML file like this one:
es:
image: ehazlett/elasticsearch
ports:
- 9300:9300
- 9200:9200
kibana:
image: ehazlett/kibana
ports:
- 8080:80
graphite:
image: hopsoft/graphite-statsd
ports:
- 80:80
- 2003:2003
- 8125:8125/udp
logstash:
image: ehazlett/logstash
ports:
- 5000:5000
- 5000:5000/udp
volumes:
- /root/docbook/ch09/collectd/logstash.conf:/etc/logstash.conf
links:
- es:elasticsearch
command: -f /etc/logstash.conf
290 | Chapter 9: Monitoring Containers
Several nonofficial images are used in this setup: gliderlabs/logsp‐
out, borja/unixbench, ehazlett/elasticsearch, ehazlett/kibana, eha‐
zlett/logstash, and hopsoft/graphite-statsd. Check the Dockerfile of
these images on Docker Hub or build your own images if you do
not trust them.
Once all the containers are up on the two hosts, and assuming that you set up the
networking and any firewall that may exist properly (open ports on security groups if
you are using cloud instances), you will be able to access the Nginx container on port
80 of the worker host, the Kibana dashboard on port 8080 of the monitor host, and
the Graphite dashboard on port 80 of the monitor host.
The Graphite dashboard will show you basic CPU metrics coming from all the con‐
tainers running on the worker host. See Figure 9-3 for what you should see.
Figure 9-3. The Graphite dashboard showing CPU metrics for all containers
Discussion
You can get all the scripts used in this recipe by using the online material that comes
with this book. Clone the repository if you have not done so already and head over to
the docbook/ch09/collectd directory:
$ git clone https://github.com/how2dock/docbook.git
$ cd docbook/ch09/collectd
$ tree
.
├── Dockerfile
├── README.md
├── Vagrantfile
9.9 Using Collectd to Visualize Container Metrics | 291
├── collectd.conf
├── docker-report.py
├── docker-stats.py
├── logstash.conf
├── monitor.yml
└── worker.yml
The Vagrantfile allows you to start two Docker hosts on your local machine to experi‐
ment with this setup. However, you can clone this repository in two cloud instances
that have Docker and Docker Compose installed and then start all the containers. If
you use Vagrant, do the following:
$ vagrant up
$ vagrant ssh monitor
$ vagrant ssh worker
While using Vagrant for this recipe, I encountered several intermit‐
tent errors as well as delays when downloading the images. Using
cloud instances with better network connectivity might be more
enjoyable.
The two YAML files are used to easily start all containers on the two hosts. Do not
run them on the same host:
$ docker-compose -f monitor.yml up -d
$ docker-compose -f worker.yml up -d
The logstash.conf file was discussed in Recipe 9.6. Go back to this recipe if you do not
understand this configuration file.
The Dockerfile is used to build a Collectd image and was shown in the solution sec‐
tion earlier. It is based on a Debian Jessie image and installs docker-py (see Recipe
4.10) and a few other scripts.
Collectd uses plug-ins to collect metrics and send them to a data store (e.g., Carbon
with Graphite). In this setup, you use the simplest form of Collectd plug-in, which is
called an exec plug-in. This is defined in the collectd.conf file in the following section:
<Plugin exec>
Exec "docker" "/opt/collectd/bin/docker-stats.py"
NotificationExec "docker" "/opt/collectd/bin/docker-report.py"
</Plugin>
The Collectd process running in the foreground in the Collectd container will rou‐
tinely execute the two Python scripts defined in the configuration file. This is also
why you copy them in the Dockerfile. The docker-report.py script outputs values to
syslog. This has the benefit that you will also collect them via your Logspout con‐
tainer and see them in your Kibana dashboard. The docker-stats.py script uses the
Docker stats API (see Recipe 9.2) and the docker-py Python package. This script lists
292 | Chapter 9: Monitoring Containers
all the running containers, and obtains the statistics for them. For the stats called
cpu_stats, it writes a PUTVAL string to stdout. This string is understood by Collectd
and sent to the Graphite data store (a.k.a Carbon) for storage and visualization. The
PUTVAL string follows the Collectd exec plug-in syntax:
#!/usr/bin/env python
import random
import json
import docker
import sys
cli=docker.Client(base_url='unix://var/run/docker.sock')
types = ["gauge-cpu0"]
for h in cli.containers():
if not h["Status"].startswith("Up"):
continue
stats = json.loads(cli.stats(h["Id"]).next())
for k, v in stats.items():
if k == "cpu_stats":
print("PUTVAL %s/%s/%s N:%s" % (h['Names'][0].lstrip('/'), \
'docker-cpu', types[0], \
v['cpu_usage']['total_usage']))
The example plug-in in this recipe is minimal, and the statistics
need to be processed further. You might want to consider using this
Python-based plug-in instead.
See Also
• Collectd website
• Collectd Exec plug-in
• Graphite website
• Logstash website
• Collectd Docker plug-in
9.9 Using Collectd to Visualize Container Metrics | 293
9.10 Using cAdvisor to Monitor Resource Usage
in Containers
Problem
Although Logspout (see Recipe 9.6) allows you to stream application logs to remote
endpoints, you need a resource utilization monitoring system.
Solution
Use cAdvisor, created by Google to monitor resource usage and performance of its
lcmtfy containers. cAdvisor runs as a container on your Docker hosts. By mounting
local volumes, it can monitor the performance of all other running containers on that
same host. It provides a local web UI, exposes an API, and can stream data to
InfluxDB. Streaming data from running containers to a remote InfluxDB cluster
allows you to aggregate performance metrics for all your containers running in a
cluster.
To get started, let’s use a single host. Download the cAdvisor image as well as borja/
unixbench, an image that enables you to simulate a workload inside a container:
$ docker pull google/cadvisor:latest
$ docker pull borja/unixbench
$ docker run -v /var/run:/var/run:rw\
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
-p 8080:8080 \
-d \
--name cadvisor \
google/cadvisor:latest
$ docker run -d borja/unixbench
With the two containers running, you can open your browser at http://
<IP_DOCKER_HOST>:8080 and you will enjoy the cAdvisor UI (see Figure 9-4).
You will be able to browse the running containers and access metrics for each of
them.
294 | Chapter 9: Monitoring Containers
Figure 9-4. The cAdvisor UI
See Also
• cAdvisor API documentation
9.10 Using cAdvisor to Monitor Resource Usage in Containers | 295
9.11 Monitoring Container Metrics with InfluxDB,
Grafana, and cAdvisor
Problem
You would like to use an alternative to Elastic/Logstash/Kibana for your logging and
monitoring stack.
Solution
Consider using cAdvisor (see Recipe 9.10) in conjunction with InfluxDB for storing
the time-series data, and Grafana for visualizing the information. cAdvisor collects
good metrics from the containers running on your Docker host, and has an InfluxDB
storage driver that enables you to store all the metrics as a time series in InfluxDB (a
distributed database for time-series data). Visualizing the data from InfluxDB can be
done with Grafana, an equivalent to Kibana.
The following is the basic setup for a single node. You would run cAdvisor, config‐
ured to send data to an InfluxDB host, and you would run InfluxDB and Grafana. All
of these come as containers:
$ docker run -d -p 8083:8083 -p 8086:8086 \
-e PRE_CREATE_DB="db" \
--name influxdb \
tutum/influxdb:0.8.8
$ docker run -d -p 80:80 \
--link=influxdb:influxdb \
-e HTTP_USER=admin \
-e HTTP_PASS=root \
-e INFLUXDB_HOST=influxdb \
-e INFLUXDB_NAME=db \
--name=grafana \
tutum/grafana
$ docker run -v /var/run:/var/run:rw \
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
-p 8080:8080 \
--link=influxdb:influxdb \
-d --name=cadvisor \
google/cadvisor:latest \
-storage_driver=influxdb \
-storage_driver_host=influxdb:8086 \
-storage_driver_db=db
In a multiple hosts setup, you would run only cAdvisor on all your nodes. InfluxDB
would be running in a distributed manner on several hosts, and Grafana might be
behind an Nginx proxy for load-balancing. Considering the fast pace of development
296 | Chapter 9: Monitoring Containers
of these systems and the changes going on in the images, you might have to adjust the
docker run commands shown previously to get a working system.
9.12 Gaining Visibility into Your Containers’ Layout with
Weave Scope
Problem
Building a distributed application based on a microservices architecture leads to hun‐
dreds of (and potentially more) containers running in your data center. Visibility into
that application and all the containers that it’s made of is crucial and a key part of
your overall infrastructure.
Solution
Weave Scope from Weaveworks provides a simple yet powerful way of probing your
infrastructure and dynamically creating a map of all your containers. It gives you
multiple views—per container, per image, per host, and per application—allowing
you to group containers and drill down on their characteristics.
It is open source and available on GitHub.
To facilitate testing, I prepared a Vagrant box, similar to many other recipes in this
book. Clone the repository with Git and launch the Vagrant box:
$ git clone https://github.com/how2dock/docbook.git
$ cd how2dock/ch09/weavescope
$ vagrant up
The Vagrant box installs the latest Docker version (i.e., 1.6.2 as of this writing) and
installs Docker Compose (see Recipe 7.1). In the /vagrant folder, you will find a com‐
pose file that gives you a synthetic three-tiered application made of two load-
balancers, two application containers, and three database containers. This is a toy
application meant to illustrate Weave Scope. Once the VM has booted, ssh into it, go
to the /vagrant folder, and launch Compose and the Weave Scope script (i.e., scope)
like so:
$ vagrant ssh
$ cd /vagrant
$ docker-compose up -d
$ ./scope launch
You will end up with eight containers running: seven for the tiered toy application
and one for Weave Scope. The toy application is accessible at http://
192.168.33.10:8001 or http://192.168.33.10:8002. Of course, the most interesting part
is the Weave Scope dashboard. Open your browser at http://192.168.33.10:4040 and
you will see something similar to Figure 9-5.
9.12 Gaining Visibility into Your Containers’ Layout with Weave Scope | 297
Figure 9-5. The Weave Scope dashboard
Navigate through the UI, explore the various grouping capabilities, and explore the
information of each container.
Discussion
Weave Scope is still in early development and considered pre-alpha as of this writing.
You should expect more features to be added to this open source product. Keeping an
eye on this visibility solution for Docker containers is definitely worthwhile.
Building from source is straightforward with a Makefile that builds a Docker image.
See Also
• Detect, Map, and Monitor Docker Containers with Weave Scope
298 | Chapter 9: Monitoring Containers
CHAPTER 10
Application Use Cases
10.0 Introduction
To finish this book, I will argue that Docker makes building distributed applications
painless. You now have all the tools in your arsenal to build a microservices applica‐
tion that will scale within and outside of your datacenter. At the very least, deploying
existing distributed systems/frameworks is made easier because you need to only
launch a few containers. Docker Hub is full of MongoDB, Elastic, and Cassandra
images, and more. Assuming that you like what is inside those images, you can grab
them and run one or multiple containers, and you are done.
This last chapter presents a few use-cases that are meant as teasers and put you on
your way to building your own application. First in Recipe 10.1, Pini Reznik shows
you how to build a continuous integration pipeline with Docker and Jenkins. He then
shows you how to extend it and build a continuous deployment pipeline using Mesos
in Recipe 10.2.
In Recipe 10.3, we present an advanced recipe that show you how to build a dynamic
load-balancing setup. It leverages registrator with a consul key-value store and
confd. confd is a system to manage configuration templates. It watches keys in your
key-value store and upon modification of the values of those keys automatically re-
writes a configuration file based on a template. Using this setup you can, for example,
automatically reconfigure a load-balancer when new backends are added. This is key
to building an elastic load-balancer.
With Recipe 10.4, we build an S3-compatible object store, based on Cassandra run‐
ning in Kubernetes and a software called pithos, which exposes an S3 API and man‐
ages buckets in Cassandra. It scales automatically through the use of Kubernetes rep‐
lication controllers.
299
In Recipe 10.5 and Recipe 10.6 we build a MySQL Galera cluster using Docker Net‐
work. Docker network is still experimental at the time of writing but this recipe will
give you a great insight into what will be possible with it. With automatic container
linking, nodes of a MySQL Galera cluster can discover themselves on a multihost net‐
work and build a cluster as if the containers were on a single host. This is extremely
powerful and will simplify distributed application design.
We finish with a Big Data example by deploying Spark, a large-scale data processing
system. You can run a Spark on a Kubernetes cluster but you can also run it on a
Docker Network–based infrastructure extremely easily. This last recipe shows you
how.
Enjoy this last chapter and hopefully it will spark your interest.
10.1 CI/CD: Setting Up a Development Environment
Contributed by Pini Reznik
Problem
You need a consistent and reproducible development environment for your Node.js
application. You don’t want to rebuild the Docker image every time you make small
changes to the Node.js sources.
Solution
Create a Docker image that includes all the required dependencies. Mount external
volumes during the development and use the ADD instruction in the Dockerfile for
distributing the image with the application to other developers.
First you need a Node.js Hello World application that includes two files:
app.js:
// Load the http module to create an http server.
var http = require('http');
// Configure our HTTP server to respond with Hello World to all requests.
var server = http.createServer(function (request, response) {
response.writeHead(200, {"Content-Type": "text/plain"});
response.end("Hello World");
});
// Listen on port 8000, IP defaults to "0.0.0.0"
server.listen(8000);
// Put a friendly message on the terminal
console.log("Server running at http://127.0.0.1:8000/");
300 | Chapter 10: Application Use Cases
package.json:
{
"name": "hello-world",
"description": "hello world",
"version": "0.0.1",
"private": true,
"dependencies": {
"express": "3.x"
},
"scripts": {"start": "node app.js"}
}
To create the Docker image, you can use the following Dockerfile:
FROM google/nodejs
WORKDIR /app
ADD package.json /app/
RUN npm install
ADD . /app
EXPOSE 8000
CMD []
ENTRYPOINT ["/nodejs/bin/npm", "start"]
This Dockerfile installs all the application dependencies and adds the application to
the image, ready to be started by using the ENTRYPOINT instruction.
The order of the instructions in the Dockerfile is important.
Adding package.json and installing dependencies before the addi‐
tion of the rest of the application will help to shorten the build time
in all cases when the application changes but dependencies remain
the same. This is because the ADD instruction invalidates the
Docker cache when any of copied files have been changed, and this
leads to the repetitive execution of all the commands that follow.
When you have your three files, you can build the Docker image and run a container:
$ docker build -t my_nodejs_image .
$ docker run -p 8000:8000 my_nodejs_image
This starts a container with the application built into the image by the ADD instruc‐
tion. To be able to test your application changes, you can mount a volume with the
source into the container by using the following command:
$ docker run -p 8000:8000 -v "$PWD":/app my_nodejs_image
This mounts the current folder with the latest sources inside the container as the
application folder. This way, you can inject the latest sources during the development
without rebuilding the image.
10.1 CI/CD: Setting Up a Development Environment | 301
To share the images between the developers and push the images to alternative testing
environments, you can use a Docker registry. The following commands build and
push the image to the specified Docker registry:
$ docker build -t <docker registry URL>:<docker registry port> \
/containersol/nodejs_app:<image tag>
$ docker push <docker registry URL>:<docker registry port>\
/containersol/nodejs_app:<image tag>
To simplify the work with the development environment and ease the future integra‐
tion into a centralized testing environment, you can use the following three scripts:
build.sh, test.sh, and push.sh. These scripts will become a single command interface
for every common operation you are required to perform during the development.
build.sh:
#!/bin/bash
# The first parameter passed to this script will be used as an image version.
# If none is passed, latest will be used as a tag.
if [ -z "${1}" ]; then
version="latest"
else
version="${1}"
fi
cd nodejs_app
docker build -t localhost:5000/containersol/nodejs_app:${version} .
cd ..
test.sh:
#!/bin/bash
# The first parameter passed to this script will be used as an image version.
# If none is passed, latest will be used as a tag.
if [ -z "${1}" ]; then
version="latest"
else
version="${1}"
fi
docker run -d --name node_app_test -p 8000:8000 -v "$PWD":/app localhost:5000/ \
containersol/nodejs_app:${version}
echo "Testing image: localhost:5000/containersol/nodejs_app:${version}"
# Allow the webserver to start up
sleep 1
# Test will be successful if the webpage at the
# following URL includes the word “success”
curl -s GET http://localhost:8000 | grep success
status=$?
302 | Chapter 10: Application Use Cases
# Clean up the testing container
docker kill node_app_test
docker rm node_app_test
if [ $status -eq 0 ] ; then
echo "Test succeeded"
else
echo "Test failed"
fi
exit $status
push.sh:
#!/bin/bash
# The first parameter passed to this script will be used as an image version.
# If none is passed, latest will be used as a tag.
if [ -z "${1}" ]; then
version="latest"
else
version="${1}"
fi
docker push localhost:5000/containersol/nodejs_app:"${version}"
Now you can build, test, and push the resulting image to a Docker registry by using
the following commands:
$ ./build.sh <version>
$ ./test.sh <version>
$ ./push.sh <version>
Discussion
It is generally a good practice to have a consistent set of build, test, and deployment
commands that can be executed in any environment, including development
machines. This way, developers can test the application in exactly the same way as it is
going to be tested in the continuous integration environment and catch the problems
related to the environment itself at earlier stages.
This example uses simple shell scripts, but a more common way to achieve the same
results is to use build systems such as Maven or Gradle. Both systems have Docker
plug-ins and can be easily used to build and push the images, using the same build
interface already used for compiling and packaging the code.
Our current testing environment has only a single container, but in case you need a
multicontainer setup, you can use docker-compose to set up the environment as well
as replace a simple curl/grep combination with more-appropriate testing systems
such as Selenium. Selenium is also available in a Docker container and can be
10.1 CI/CD: Setting Up a Development Environment | 303
deployed together with the rest of the application containers by using docker-
compose.
10.2 CI/CD: Building a Continuous Delivery Pipeline with
Jenkins and Apache Mesos
Contributed by Pini Reznik
Problem
You would like to set up a continuous delivery pipeline for an application packaged
using Docker containers.
Solution
Set up a Jenkins continuous integration server to deploy an application to a Mesos
cluster in case the tests are passing.
Figure 10-1 gives a graphical representation of the environment you are going to use
at the end of this recipe. The goal is to take an application from a development envi‐
ronment, package it into a Docker container, push it to the Docker registry in case the
tests are passing, and tell Marathon to schedule the application on Mesos.
Figure 10-1. Continuous delivery pipeline using Jenkins and Apache Mesos
This recipe uses the previous example in Recipe 10.1. You can also see a way to set up
a Mesos cluster for development purposes in Recipe 7.2.
304 | Chapter 10: Application Use Cases
First you need to set up a Jenkins server. The easiest way is to use the following
Docker Compose configuration:
jenkins:
image: jenkins
volumes:
- jenkins-home:/var/jenkins_home
ports:
- "8080:8080"
The volumes defined in the preceding Compose file act as persistent storage to avoid
losing your build configurations and data every time you restart your Jenkins con‐
tainer. It is the responsibility of the owner to back up and maintain those folders out‐
side Docker containers.
Start Docker Compose with the following command:
$ docker-compose up
You get a functional Jenkins server running on the following address: http://localhost:
8080.
This was an easy task, but unfortunately not useful because you need to build an
image that includes your application and also need to start containers using the newly
built image to test your application. This is not possible in a standard Docker con‐
tainer.
To solve this, you can add two more lines to docker-compose.yml:
jenkins:
image: jenkins
volumes:
- jenkins-home:/var/jenkins_home
- /var/run/docker.sock:/var/run/docker.sock
- /usr/bin/docker:/usr/bin/docker
ports:
- "8080:8080"
Two new volumes will mount the socket used for the communication between
Docker client and server and add the Docker binary itself to act as a client. This way,
you can run Docker commands inside the Jenkins container, and they will be exe‐
cuted on the host in parallel to the Jenkins container itself.
Another hurdle on the way toward a fully functional Jenkins server capable of run‐
ning Docker commands is permissions. By default, /var/run/docker.sock is accessible
to root or anyone in the group called docker. The default Jenkins container is using a
user called jenkins to start the server. The Jenkins server does not belong to the
docker group, but even if it was in such a group inside the container, it still would not
get the access to the Docker socket, as groups’ and users’ IDs differ between the host
10.2 CI/CD: Building a Continuous Delivery Pipeline with Jenkins and Apache Mesos | 305
and the containers running on it (with exception of root, which always has ID 0). To
solve this, you can use root to start the Jenkins server.
For this, you need to add a new user instruction to docker-compose.yml:
jenkins:
image: Jenkins
user: root
volumes:
- jenkins-home:/var/jenkins_home
- /var/run/docker.sock:/var/run/docker.sock
- /usr/bin/docker:/usr/bin/docker
ports:
- "8080:8080"
Now, when you have a functional Jenkins server, you can deploy the Node.js applica‐
tion described in Recipe 10.1.
In the Node.js recipe, you already have scripts to build, test, and push the image to a
Docker registry. You need to add a configuration file to schedule the application on
Mesos, using Marathon and another script to deploy the application.
You call the application configuration for Marathon app_marathon.json:
{
"id": "app",
"container": {
"docker": {
"image": "localhost:5000/containersol/nodejs_app:latest",
"network": "BRIDGE",
"portMappings": [
{"containerPort": 8000, "servicePort": 8000}
]
}
},
"cpus": 0.2,
"mem": 512.0,
"instances": 1
}
This configuration uses our application Docker image that you are going to build
using Jenkins and deploy it on Mesos by using Marathon. This file also defines the
resources needed for your application and can also include a health check.
The last piece of the configuration is the deployment script that you are going to run
from Jenkins.
deploy.sh:
#!/bin/bash
marathon=<Marathon URL>
306 | Chapter 10: Application Use Cases
if [ -z "${1}" ]; then
version="latest"
else
version="${1}"
fi
# destroy old application
curl -X DELETE -H "Content-Type: application/json" \
http://${marathon}:8080/v2/apps/app
# At this point we can query Marathon until the application is down.
sleep 1
# these lines will create a copy of app_marathon.json and update the image
# version. This is required for sing the cottect image tag, as the marathon
# configuration file does not support variables.
cp -f app_marathon.json app_marathon.json.tmp
sed -i "s/latest/${version}/g" app_marathon.json.tmp
# post the application to Marathon
curl -X POST -H "Content-Type: application/json" \
http://${marathon}:8080/v2/apps \
-d@app_marathon.json.tmp
Now you can start the Jenkins server by using docker-compose and define the execu‐
tion steps in the Jenkins job configuration. Figure 10-2 shows the UI where this con‐
figuration can be done.
Figure 10-2. The Jenkins UI
10.2 CI/CD: Building a Continuous Delivery Pipeline with Jenkins and Apache Mesos | 307
Discussion
There are multiple ways to solve the problem of starting containers from within a
container. Mounting a socket used by Docker for communication between the server
and client is one of them. Additional methods may include running a container
directly within a container using a privileged container. Another way is to configure
the Docker server to receive remote API calls and configure the Docker client within
the Jenkins container to communicate with it using a full URL. This requires config‐
uring networking to allow communication between the server and the client.
10.3 ELB: Creating a Dynamic Load-Balancer with Confd
and Registrator
Problem
You want to build a dynamic load-balancer that gets dynamically reconfigured when
containers come and go.
Solution
The solution is based on registrator (see Recipe 7.13), which acts as a service-
discovery mechanism, and confd, which gets information from the key-value store
used by registrator and writes configuration files based on templates.
To illustrate this, you will build a simple one-node setup. A simple hostname applica‐
tion will run in multiple containers. An Nginx load-balancer will front these contain‐
ers to distribute the load among the containers. These containers will get automati‐
cally registered in a Consul key-value store, thanks to registrator. Then confd will
pull information from Consul to write an Nginx configuration file. The load-balancer
(i.e., Nginx) will then get restarted using the new configuration. Figure 10-3 illus‐
trates this example.
308 | Chapter 10: Application Use Cases
Figure 10-3. Dynamic load-balancing schematic
To get started, you will reproduce the steps explained in Recipe 7.13. You will start a
Consul-based key-value store via a single container.
In production deployments, you will want to use a multinode
key-value store running separately from the nodes running your
application.
This is done easily with the Docker image progrium/consul, like so:
$ docker run -d -p 8400:8400 -p 8500:8500 -p 8600:53/udp
-h cookbook progrium/consul -server
-bootstrap -ui-dir /ui
Then you will start the registrator container and set the registry URI to consul://
192.168.33.10:8500/elb. The IP address of your Docker host will be different.
$ docker run -d -v /var/run/docker.sock:/tmp/docker.sock
-h 192.168.33.10 gliderlabs/registrator
-ip 192.168.33.10 consul://192.168.33.10:8500/elb
Next you will start your toy application. You can use your own and pull runseb/host
name, which is a simple application that returns the container ID. Start two of them at
first:
$ docker run -d -p 5001:5000 runseb/hostname
$ docker run -d -p 5002:5000 runseb/hostname
10.3 ELB: Creating a Dynamic Load-Balancer with Confd and Registrator | 309
If you check the Consul UI, you will see that the two containers are properly regis‐
tered, thanks to registrator, as shown in Figure 10-4.
Figure 10-4. Consul dynamic load-balancing nodes
Create an Nginx configuration file that acts as a load-balancer for these two applica‐
tions. Assuming your Docker host is 192.168.33.10, the following example will
work:
events {
worker_connections 1024;
}
http {
upstream elb {
server 192.168.33.10:5001;
server 192.168.33.10:5002;
}
server {
listen 80;
location / {
proxy_pass http://elb;
}
310 | Chapter 10: Application Use Cases
}
}
Next start your Nginx container, binding port 80 of the container to port 80 of the
host, and mount your configuration file inside the container. Give your container a
name that will prove handy later:
$ docker run -d -p 80:80 -v /home/vagrant/nginx.conf:/etc/nginx/nginx.conf
--name elb nginx
At this stage, you have a basic load-balancing setup. The Nginx container exposes
port 80 on the host, and load balances two application containers. If you use curl to
send an HTTP request to your Nginx container, you will get the container ID of the
two application containers. It will look like this:
$ curl http://192.168.33.10
8eaab9c31e1a
$ curl http://192.168.33.10
a970ec6274ca
$ curl http://192.168.33.10
8eaab9c31e1a
$ curl http://192.168.33.10
a970ec6274ca
Up to now, there is nothing dynamic except the registration of the containers. To be
able to reconfigure Nginx when containers come and go, you need a system that will
watch keys in Consul and write a new Nginx configuration file when the values
change. That is where confd comes into play. Download a confd binary from the Git‐
Hub release page.
The quick start guide is good. But we will go over the basic steps. First let’s create the
directories that will hold your configuration templates:
sudo mkdir -p /etc/confd/{conf.d,templates}
Next create a resource template, resource config. This file basically tells confd where
the configuration template is that you want to be managed and tells where to write
the configuration file after the values have been replaced. In /etc/confd/conf.d/
config.toml write:
[template]
src = "config.conf.tmpl"
dest = "/home/vagrant/nginx.conf"
keys = [
"/elb/hostname",
]
Now let’s write our Nginx template file in /etc/confd/templates/config.conf.tmpl. These
templates are Golang text templates, so anything that you can do in a Golang tem‐
plate, you can do in this template:
10.3 ELB: Creating a Dynamic Load-Balancer with Confd and Registrator | 311
events {
worker_connections 1024;
}
http {
upstream elb {
{{range getvs "/elb/hostname/*"}}
server {{.}};
{{end}}
}
server {
listen 80;
location / {
proxy_pass http://elb;
}
}
}
This template is a minimal Nginx load-balancing configuration file. You see that the
upstream defined as elb will have a set of servers that will be extracted from the /elb/
hostname/ keys stored in Consul.
Now that your templates are in place, let’s try confd in a one-time shot mode. This
means that you will call confd manually, you will specify the type of backend (i.e., in
our case, Consul), and it will write the file /home/vagrant/nginx.conf (this was defined
as the dest key in the config.toml file):
$ ./confd -onetime -backend consul -node 192.168.33.10:8500
Since you have already written your nginx.conf file when you started the Nginx con‐
tainer, the configuration file written by confd should be exactly the same. Now let’s
start a new application container and rerun the confd command:
$ docker run -d -p 5003:5000 runseb/hostname
$ ./confd -onetime -backend consul -node 192.168.33.10:8500
... ./confd[832]: WARNING Skipping confd config file.
... ./confd[832]: INFO /home/vagrant/nginx.conf has md5sum \
acf6552d92cb9eb79b1068cf40b8ec0f should be 001894b713827404d0c5e72e2a66844d
... ./confd[832]: INFO Target config /home/vagrant/nginx.conf out of sync
... ./confd[832]: INFO Target config /home/vagrant/nginx.conf has been updated
You see that confd detects that the configuration has changed and it writes a new
configuration file. When we start the new application container, registrator auto‐
matically registers it in consul, and confd is able to detect this and write the new con‐
figuration. Now since you did this as a one-time command, let’s restart the Nginx
container, and you will see that it will use the new configuration (which is accessible
via a volume mount in the Nginx container):
312 | Chapter 10: Application Use Cases
$ docker restart elb
$ curl http://192.168.33.10
a970ec6274ca
$ curl http://192.168.33.10
8eaab9c31e1a
$ curl http://192.168.33.10
71d8297c1538
The only thing left to do now is to run confd in a daemon mode, and instruct it to
stop and restart the Nginx container when changes to the configuration are done. To
do this, edit /etc/confd/conf.d/config.toml and add a reload_cmd to restart Nginx like
so (this assumes you named your Nginx container elb as indicated earlier):
[template]
src = "config.conf.tmpl"
dest = "/home/vagrant/nginx.conf"
keys = [
"/elb/hostname",
]
reload_cmd = "docker restart elb"
Finally, run confd in daemon mode, and for testing, set a short interval for when it
will poll Consul. Then have fun starting and stopping your application container. You
will see that every time you start or stop/kill an application container, confd will
dynamically update your configuration and restart the elb container:
$ ./confd -backend consul -interval 5 -node 192.168.33.10:8500
... ./confd[1463]: WARNING Skipping confd config file.
... ./confd[1463]: INFO /home/vagrant/nginx.conf has md5sum \
acf6552d92cb9eb79b1068cf40b8ec0f should be 001894b713827404d0c5e72e2a66844d
... ./confd[1463]: INFO Target config /home/vagrant/nginx.conf out of sync
... ./confd[1463]: INFO Target config /home/vagrant/nginx.conf has been updated
... ./confd[1463]: INFO /home/vagrant/nginx.conf has md5sum \
001894b713827404d0c5e72e2a66844d should be cecb5ddc469ba3ef17f9861cde9d529a
... ./confd[1463]: INFO Target config /home/vagrant/nginx.conf out of sync
... ./confd[1463]: INFO Target config /home/vagrant/nginx.conf has been updated
... ./confd[1463]: INFO /home/vagrant/nginx.conf has md5sum \
cecb5ddc469ba3ef17f9861cde9d529a should be 0b97f157f437083ffba43f93a426d28f
... ./confd[1463]: INFO Target config /home/vagrant/nginx.conf out of sync
... ./confd[1463]: INFO Target config /home/vagrant/nginx.conf has been updated
This is dynamic load balancing with Docker. To make it elastic, you would need to
monitor the load and automatically start a new application container, which would
trigger a reconfiguration of the elb configuration.
Discussion
This recipe is quite long and has many steps. To facilitate the testing, I prepared a
Vagrant box, as always; try this:
10.3 ELB: Creating a Dynamic Load-Balancer with Confd and Registrator | 313
$ git clone https://github.com/how2dock/dockbook.git
$ cd docbook/ch10/confd
$ vagrant up
$ vagrant ssh
You will have all the images downloaded and ready to go:
$ docker images TAG IMAGE ID CREATED VIRTUAL SIZE
REPOSITORY latest e66fb6787628 10 days ago 69.43 MB
progrium/consul latest 319d2015d149 3 weeks ago 132.8 MB
nginx latest 7c9d1ddd2ceb 3 months ago 349.3 MB
runseb/hostname latest b1c29d1a74a9 4 months ago 11.79 MB
gliderlabs/registrator
And the confd configuration files will be already set in /etc/confd/conf.d/config.toml
and /etc/confd/templates/config.conf.tmp.
Just start the containers:
$ docker run -d -p 8400:8400 -p 8500:8500 -p 8600:53/udp
-h cookbook progrium/consul -server
-bootstrap -ui-dir /ui
$ docker run -d -v /var/run/docker.sock:/tmp/docker.sock
-h 192.168.33.10 gliderlabs/registrator
-ip 192.168.33.10 consul://192.168.33.10:8500/elb
$ docker run -d -p 80:80 -v /home/vagrant/nginx.conf:/etc/nginx/nginx.conf
--name elb nginx
$ docker run -d -p 5001:5000 runseb/hostname
$ docker run -d -p 5002:5000 runseb/hostname
You can start those containers via Docker Compose (see Recipe
7.1).
And run confd:
$ ./confd -backend consul -interval 5 -node 192.168.33.10:8500
See Also
• Quick Start guide for confd
314 | Chapter 10: Application Use Cases
10.4 DATA: Building an S3-Compatible Object Store with
Cassandra on Kubernetes
Problem
You would like to build your own S3-like object store.
Solution
Amazon S3 is the leading cloud-based object storage service. Since it came online,
several storage backends have developed an S3-compatible API frontend to their dis‐
tributed storage system: RiakCS, GlusterFS, and Ceph. The Apache Cassandra dis‐
tributed database is also a good choice, and recently a project called Pithos has started
that builds an S3-compatible object store on top of Cassandra.
This is particularly interesting because Cassandra is widely used in the enterprise.
However, for Docker this might be challenging as you would need to build a Cassan‐
dra cluster using Docker containers. Thankfully, with a cluster manager/container
orchestration system like Kubernetes, it is relatively painless to run a Docker-based
Cassandra cluster. The Kubernetes documentation has an example of how to do it.
Therefore, to build our S3 object store, you are going to run a Cassandra cluster on
Kubernetes and run a Pithos frontend that will expose an S3-compatible API.
It is possible to do the same with Docker Swarm.
To start, you need to have access to a Kubernetes cluster. The easiest way is to use
Google Container Engine (see Recipe 8.10). If you do not want to use Google Con‐
tainer Engine or need to learn about Kubernetes, check Chapter 5 and you will learn
how to deploy your own cluster. Whatever technique you use, before proceeding, you
should be able to use the kubectl client and list the nodes in your cluster. For example:
$ ./kubectl get nodes LABELS STATUS
NAME kubernetes.io/hostname=...-node-hsdb Ready
k8s-cookbook-935a6530-node-hsdb kubernetes.io/hostname=...-node-mukh Ready
k8s-cookbook-935a6530-node-mukh kubernetes.io/hostname=...-node-t9p8 Ready
k8s-cookbook-935a6530-node-t9p8 kubernetes.io/hostname=...-node-ugp4 Ready
k8s-cookbook-935a6530-node-ugp4
You are now ready to start a Cassandra cluster. You can use the Kubernetes example
directly or clone my own repo:
10.4 DATA: Building an S3-Compatible Object Store with Cassandra on Kubernetes | 315
$ git clone https://github.com/how2dock/dockbook.git
$ cd ch05/examples
Since Kubernetes is a fast evolving software, the API is changing
quickly. The pod, replication controller, and service specification
files may need to be adapted to the latest API version.
Then launch the Cassandra replication controller, increase the number of replicas,
and launch the service:
$ kubectl create -f ./cassandra/cassandra-controller.yaml
$ kubectl scale --replicas=4 rc cassandra
$ kubectl create -f ./cassandra/cassandra-service.yaml
Once the image is downloaded, you will have your Kubernetes pods in a running
state. Note that the image currently used comes from the Google registry. That’s
because this image contains a discovery class specified in the Cassandra configura‐
tion. You could use the Cassandra image from Docker Hub but would have to put
that Java class in there to allow all Cassandra nodes to discover each other. Changing
the number of replicas allows you to scale your Cassandra cluster, and starting a ser‐
vice allows you to expose a DNS endpoint for it.
Check that the specified number of pods is running:
$ kubectl get pods --selector="name=cassandra"
Once Cassandra discovers all nodes and rebalances the database storage, you will get
something like this (it will depend on the number of replicas you set, and the IDs will
change):
$ ./kubectl exec cassandra-5f709 -c cassandra nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.16.2.4 84.32 KB 256 46.0% 8a0c8663-074f-4987... rack1
UN 10.16.1.3 67.81 KB 256 53.7% 784c8f4d-7722-4d16... rack1
UN 10.16.0.3 51.37 KB 256 49.7% 2f551b3e-9314-4f12... rack1
UN 10.16.3.3 65.67 KB 256 50.6% a746b8b3-984f-4b1e... rack1
You can access the logs of a container in a pod with the handy
kubectl logs command.
316 | Chapter 10: Application Use Cases
Now that you have a fully functioning Cassandra cluster, you can move on to launch‐
ing Pithos, which will provide the S3 API and use Cassandra as the object store.
Pithos is a daemon that “provides an S3-compatible frontend to a Cassandra cluster.”
So if you run Pithos in your Kubernetes cluster and point it to your running Cassan‐
dra cluster, you can expose an S3-compatible interface.
To that end, I created a Docker image for Pithos, runseb/pithos, on Docker Hub. It’s
an automated build, so you can check out the Dockerfile there. The image contains
the default configuration file. You will want to change it to edit your access keys and
bucket store definitions.
You will now launch Pithos as a Kubernetes replication controller and expose a ser‐
vice with an external load-balancer created on GCE. The Cassandra service that you
launched allows Pithos to find Cassandra by using DNS resolution.
However, you need to set up the proper database schema for the object store. This is
done through a bootstrapping process. To do it, you need to run a nonrestarting pod
that installs the Pithos schema in Cassandra. Use the YAML file from the example
directory that you cloned earlier:
$ kubectl create -f ./pithos/pithos-bootstrap.yaml
Wait for the bootstrap to happen (i.e., for the pod to get in succeed state). Then launch
the replication controller. For now, you will launch only one replica. Using a replica‐
tion controller makes it easy to attach a service and expose it via a public IP address.
$ kubectl create -f ./pithos/pithos-rc.yaml
$ kubectl create -f ./pithos/spithos.yaml
$ ./kubectl get services --selector="name=pithos"
NAME LABELS SELECTOR IP(S) PORT(S)
8080/TCP
pithos name=pithos name=pithos 10.19.251.29
104.197.27.250
Since Pithos will serve on port 8080 by default, make sure that you open the firewall
for the public IP of the load-balancer. Once the Pithos pod is in its running state, you
are done and have built an S3-compatible object store backed by Cassandra running
in Docker containers managed by Kubernetes. Congratulations!
Discussion
The setup is interesting, but you need to be able to use it and confirm that it is indeed
S3 compatible. To do this, you can try the well-known S3 utilities like s3cmd or boto.
For example, start by installing s3cmd and create a configuration file:
$ cat ~/.s3cfg
[default]
access_key = AKIAIOSFODNN7EXAMPLE
secret_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
10.4 DATA: Building an S3-Compatible Object Store with Cassandra on Kubernetes | 317
check_ssl_certificate = False
enable_multipart = True
encoding = UTF-8
encrypt = False
host_base = s3.example.com
host_bucket = %(bucket)s.s3.example.com
proxy_host = 104.197.27.250
proxy_port = 8080
server_side_encryption = True
signature_v2 = True
use_https = False
verbosity = WARNING
Replace the proxy_host with the IP that you obtained from the Pithos service external
load-balancer.
This example uses an unencrypted proxy. Moreover, the access and
secret keys are the default stored in the Dockerfile; change them.
With this configuration in place, you are ready to use s3cmd and create buckets to
store content:
$ s3cmd mb s3://foobar
Bucket 's3://foobar/' created
$ s3cmd ls
2015-06-09 11:19 s3://foobar
If you wanted to use Boto in Python, this would work as well:
#!/usr/bin/env python
from boto.s3.key import Key
from boto.s3.connection import S3Connection
from boto.s3.connection import OrdinaryCallingFormat
apikey='AKIAIOSFODNN7EXAMPLE'
secretkey='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
cf=OrdinaryCallingFormat()
conn=S3Connection(aws_access_key_id=apikey,
aws_secret_access_key=secretkey,
is_secure=False,host='104.197.27.250',
port=8080,
calling_format=cf)
conn.create_bucket('foobar')
318 | Chapter 10: Application Use Cases
And that’s it. All of these steps may sound like a lot, but it has never been that easy to
run an S3 object store. Docker truly makes running distributed applications a breeze.
10.5 DATA: Building a MySQL Galera Cluster on
a Docker Network
Problem
You would like to deploy a MySQL Galera cluster on two Docker hosts, taking advan‐
tage of the new Docker Network feature (see Recipe 3.14). Galera is a multimaster
high-availability MySQL database solution.
Solution
Docker Network, which you saw in Recipe 3.14, can be used to build a network over‐
lay using the VXLAN protocol across multiple Docker hosts. The overlay is useful as
it gives containers IP addresses in the same routable subnet and also manages name
resolution by updating /etc/hosts on each container. Therefore, every container
started in the same overlay can reach the other ones by using their container names.
This significantly simplifies networking across hosts, and makes a lot of solutions that
have been built for single hosts also valid for a multiple-hosts setup.
At time of this writing Docker Network is in preview in the experi‐
mental Docker binaries (i.e., 1.8.0-dev). It should be available in
Docker 1.9. The use of a Consul server may not be needed in the
future.
To build your Galera cluster on two Docker hosts, you will start by setting up the
hosts with the experimental Docker binary. You will then follow the instructions
described in a blog post. The setup of this recipe is the same as the one depicted in
Recipe 3.14. You will start several containers on each node by using the erkules/
galera:basic image from Docker Hub.
As always, let’s use a Vagrant box from the repository accompanying this book:
$ git clone https://github.com/how2dock/dockbook.git
$ cd dockbook/ch10/mysqlgalera
$ vagrant up
$ vagrant status
Current machine states:
consul-server running (virtualbox)
mysql-1 running (virtualbox)
mysql-2 running (virtualbox)
10.5 DATA: Building a MySQL Galera Cluster on a Docker Network | 319
The consul-server machine is currently used by Docker Network, but this may
change. Currently we use this Consul server as a key-value store; the Docker engine
on each host uses it to store information about each host. As a reminder, check the
Vagrantfile and see the DOCKER_OPTS specified at start-up; you will see that we also
define a default overlay network called multihost.
Once the machines are up, ssh to the first one and start the first node of your Galera
cluster by using the image erkules/galera:basic. You can check the reference to see
what is in the Dockerfile used to build this image.
Let’s do it:
$ vagrant ssh mysql-1
$ docker run -d --name node1 -h node1 erkules/galera:basic \
--wsrep-cluster-name=local-test \
--wsrep-cluster-address=gcomm://
Get on host mysql-2 and start two additional Galera nodes. Note that you use the
node name node1 for the cluster address. This will work because Docker Network will
automatically properly define the /etc/hosts file and it will contain the IP address of
node1, node2, and node3. Since the three containers are in the same overlay, they will
be able to reach one another without any port mapping, container linking, or other
more complex network setup:
$ vagrant ssh mysql-2
$ docker run --detach=true --name node2 -h node2 erkules/galera:basic \
--wsrep-cluster-name=local-test \
--wsrep-cluster-address=gcomm://node1
$ docker run --detach=true --name node3 -h node3 erkules/galera:basic \
--wsrep-cluster-name=local-test \
--wsrep-cluster-address=gcomm://node1
Back on mysql-1, you will see that after a short time, the two nodes started on
mysql-2 have automatically joined the cluster:
$ docker exec -ti node1 mysql -e 'show status like "wsrep_cluster_size"'
+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
And indeed the /etc/hosts file on the node1 container has the IP address of the other
two nodes:
$ docker exec -ti node1 cat /etc/hosts
...
172.21.0.6 node1.multihost
172.21.0.6 node1
172.21.0.8 node2
172.21.0.8 node2.multihost
320 | Chapter 10: Application Use Cases
172.21.0.9 node3
172.21.0.9 node3.multihost
This recipe is interesting because using Docker Network allows you to use the exact
same deployment methodology that you would have used on a single Docker host.
Discussion
Try adding more Galera nodes, killing some, and you will see that the cluster size
varies.
See Also
• Blog post on building a Galera cluster on a single Docker host
• Blog post on building a Galera cluster on multiple Docker hosts
10.6 DATA: Dynamically Configuring a Load-Balancer for a
MySQL Galera Cluster
Problem
Recipe 10.5 created a multinode Galera cluster on two Docker hosts, taking advantage
of the Docker Network capability to create a network overlay. Now you would like to
automatically configure a load-balancer to share the load among all the nodes of this
Galera cluster.
Solution
Use the setup described in Recipe 10.3. Use registrator to dynamically register the
MySQL nodes in a key-value store like Consul, and use confd to manage an nginx
template that will balance the load among the Galera cluster nodes. Figure 10-5
depicts a two-node setup in which Docker Network is used across the nodes, regis
trator runs to publish the services running on each node, and Nginx runs on one of
the nodes to provide load-balancing between these nodes.
10.6 DATA: Dynamically Configuring a Load-Balancer for a MySQL Galera Cluster | 321
Figure 10-5. Dynamic load balancing of a Galera cluster
Run registrator on each host, pointing to the consul-server running on the separate
VM at 192.168.33.10 and start the first two nodes of the Galera cluster using the
image erkules/galera:basic.
On mysql-1 at 192.168.33.11 run the following:
$ docker run -d -v /var/run/docker.sock:/tmp/docker.sock
gliderlabs/registrator
-ip 192.168.33.11 consul://192.168.33.10:8500/galera
$ docker run -d --name node1
-h node1 erkules/galera:basic
--wsrep-cluster-name=local-test --wsrep-cluster-address=gcomm://
On mysql-2 at 192.168.33.12 use this:
$ docker run -d -v /var/run/docker.sock:/tmp/docker.sock
gliderlabs/registrator
322 | Chapter 10: Application Use Cases
-ip 192.168.33.12 consul://192.168.33.10:8500/galera
$ docker run -d --name node2
-h node2 erkules/galera:basic
--wsrep-cluster-name=local-test --wsrep-cluster-address=gcomm://node1
Create an Nginx configuration file that acts as a load-balancer for these two applica‐
tions. Assuming you decide to run the load-balancer on the Docker host with IP
192.168.33.11, the following example will work:
events {
worker_connections 1024;
}
http {
upstream galera {
server 192.168.33.11:3306;
server 192.168.33.12:3306;
}
server {
listen 80;
location / {
proxy_pass http://galera;
}
}
}
Next start your Nginx container, binding port 80 of the container to port 80 of the
host, and mount your configuration file inside the container. Give your container a
name, as this will prove handy later:
$ docker run -d -p 3306:3306 -v /home/vagrant/nginx.conf:/etc/nginx/nginx.conf
--name galera nginx
Test that your load-balancing works. Then head back to Recipe 10.3 and use the same
steps presented there. Use confd to automatically reconfigure your nginx configura‐
tion template when you add MySQL containers.
10.7 DATA: Creating a Spark Cluster
Problem
You are looking for a data-processing engine that can work in parallel for fast compu‐
tation and access to large datasets. You have settled on Apache Spark and would like
to deploy it using containers.
Solution
Apache Spark is an extremely fast data-processing engine that works at large scale
(for a large number of worker nodes) and that can also handle a large amount of data.
10.7 DATA: Creating a Spark Cluster | 323
With Java, Scala, Python, and R interfaces, Spark is a great tool to program complex
data-processing problems.
A Spark cluster can be deployed in Kubernetes, but with the development of Docker
Network, the Kubernetes deployment scenario can be used almost as is. Indeed,
Docker Network (see Recipe 3.14) builds isolated networks across multiple Docker
hosts, manages simple name resolution, and exposes services.
Hence to deploy a Spark cluster, you are going to use a Docker network and then do
the following:
• Start a Spark master by using the image available on the Google registry and used
by the Kubernetes example.
• Start a set of Spark workers by using a slightly modified image from the Google
registry.
The worker image uses a start-up script that hardcodes the Spark master port to 7077
instead of using an environment variable set by Kubernetes. The image is available on
Docker Hub and you can see the start-up script on GitHub.
Let’s start a master, making sure that you define the hostname spark-master:
$ docker run -d -p 8080:8080 --name spark-master -h spark-master gcr.io/ \
google_containers/spark-master
Now let’s create three Spark workers. You could create more and create them on any
hosts that are on the same Docker network:
To avoid crashing your nodes and/or containers, limit the memory
allocated to each Spark worker container. You do this with the -m
option of docker run.
$ docker run -d -p 8081:8081 -m 256m --name worker-1 runseb/spark-worker
$ docker run -d -p 8082:8081 -m 256m --name worker-2 runseb/spark-worker
$ docker run -d -p 8083:8081 -m 256m --name worker-3 runseb/spark-worker
You might have noticed that you exposed port 8080 of the Spark master container on
the host. This gives you access to the Spark master web interface. As soon as the
Spark master container is running, you can access this UI. After the workers come
online, you will see them appear in the dashboard, as shown in Figure 10-6.
324 | Chapter 10: Application Use Cases
Figure 10-6. The Spark master UI
This is it. The ease of deployment comes from the fact that the Spark workers try to
reach the master node with the hostname spark-master. Because the Docker Net‐
work manages name resolution, each container automatically knows the IP of the
master and can reach it.
Discussion
If you check the network services that have been published, you see your four con‐
tainers on the multihost network (i.e., spark-master, worker-1, worker-2,
worker-3). But since you also published the ports for the UI, each container was also
attached to the bridge network. In the following example, you see only the worker
nodes on the bridge because this lists the services on the node that is not running the
master. If you check the Docker host that is running the master, you will see that the
spark-master is also on the bridge network:
$ docker service ls
SERVICE ID NAME NETWORK CONTAINER
bridge ba80b36e5abc
92e90b6556b5 worker-1
bridge c1c8bec01a2a
1831b9378d37 worker-2 bridge f7be3797affb
multihost ba80b36e5abc
bc64584793df worker-3 multihost c1c8bec01a2a
2bbe00afc559 worker-1
7be77369a0ac worker-2
10.7 DATA: Creating a Spark Cluster | 325
3a576b7233b6 worker-3 multihost f7be3797affb
e3c75728c402 fa44cce982df
spark-master multihost
Since you exposed the Spark worker’s web interface port, you can access the UI.
Figure 10-7 shows a snapshot of a task that has already completed on this worker.
Figure 10-7. The Spark worker UI
The task shown in the dashboard is the result of running the Spark shell, which is a
quick way to start learning Spark and running tasks on your containerized Spark
cluster. You can run the Spark shell via another interactive container as shown here:
$ docker run -it gcr.io/google_containers/spark-base
root@ac912dd21619:/# . ./setup_client.sh spark-master 7077
root@ac912dd21619:/# pyspark
Python 2.7.9 (default, Mar 1 2015, 12:57:24)
...
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.4.0
/_/
...
>>>
326 | Chapter 10: Application Use Cases
Because libnetwork is changing rapidly, the network connectivity
between the Spark master and the workers might be unreliable.
The service publication mechanism might also change. Treat this
example as a work in progress of what could be done, but not as a
production deployment scenario. If you experience problems,
remember to check the logs of each container with docker logs -
f <container_id>.
See Also
• The Kubernetes Spark example that inspired this recipe
10.7 DATA: Creating a Spark Cluster | 327
Index
Symbols automated builds, 62
AWS (Amazon Web Services), 190
--net=host, 79
.dockerignore, 46 account creation, 233
CLI, 235-238
A ECS (see EC2 container service)
Elastic Beanstalk, 269-272
Amazon Linux AMI, 237 principles, 234
Amazon S3, 315 running Weave Net, 96
Amazon Web Services (see AWS) starting a Docker host with Docker
Another Union File System (AUFS), 126
Ansible, 53, 210-212 Machine, 243-245
Ansible Docker module, 210-212, 225 starting Atomic instance to use Docker, 184
Apache Libcloud, 188 starting Docker host on AWS EC2, 235-238
Apache Mesos (see Mesos) Ubuntu Core Snappy instance on AWS EC2,
Apache Spark cluster, 323-327
application programming interface (API) 188-191
AWS EC2
Docker remote, 119-121
Kubernetes, 158-161 Docker host on, 235-238
application use cases, 299-327 Ubuntu Core Snappy instance on, 188-191
continuous delivery pipeline with Jenkins Azure, 190, 233, 241-243, 245-247
and Apache Mesos, 304-308 B
dynamic load balancer with confd and reg‐
background, running service in, 22
istrator, 308-314 Bash, xiii
dynamically configuring a load-balancer for Beanstalk, 269-272
binary (see Docker binary)
a MySQL Galera cluster, 321-323 Bitbucket, 62-66
MySQL Galera cluster on Docker Network, Boot2Docker
319-321 and GCE CLI, 248
S3-compatible object store with Cassandra docker-py integration with, 124
for getting Docker host on OS X, 9-13
on Kubernetes, 315 on Windows 8.1 desktop, 13-15
setting up development environment, Borg, 129
Boto, 261, 264, 318
300-304 bridge, custom, 88
Spark cluster, 323-327 build trigger, 65
Atomic, 182, 184
AUFS (Another Union File System), 126
authentication, 165
329
builds, automated, 62-68 cluster IP services, 146-150
cluster(s)
C
configuring authentication to, 165
CA (certificate authority), 121 configuring client to access remote, 167
cAdvisor CoreOS, 175-178
creating with Docker Compose, 151-154
container metrics monitoring, 296 Docker Machine to create, 202
resource usage monitoring, 294 ECS, 261-265
Canonical, 186 fleet to start containers on, 178
cases, application (see application cases) Kubernetes, with Pods, 139-140
Cassandra, 315 Lattice for running containers on, 217-219
CentOS 6.5, 3 load-balancer for MySQL Galera cluster,
CentOS 7, 4
CentOS project, 42 321-323
certificate authority (CA), 121 Mesos Docker containerizer on, 224
child image, 58 multinode, 135-138
CI/CD (continuous integration/continuous on Docker Network, 319-321
deployment) Rancher to manage containers on, 213-216
development environment, 300-304 Spark, 323-327
pipeline with Jenkins and Apache Mesos, starting containers on, with Docker Swarm,
304-308 199-201
CLI (see command line interface) CMD instruction, 41
cloud Collectd, 288-293
command line interface (CLI)
accessing public clouds to run Docker,
232-235 AWS, 235-238
cloud provider, 247-249
application using Docker support in AWS GCE, 248
Beanstalk, 269-272 gcloud CLI, 256
Conduit, 67
cloud provider CLI in a Docker container, confd, 311-314
247-249 config.rb, 171
configuring, 118-128
Docker containers on an ECS cluster, (see also development)
265-268 changing storage driver, 126-128
Docker daemon, 108, 118, 121-123
Docker host on AWS EC2, 235-238 docker-py, 123-126, 125
Docker host on AWS with Docker Machine, Kubernetes, 165, 167
consul, 228
243-245 container images (see images)
Docker host on Azure with Docker container linking
alternatives for large-scale systems, 74
Machine, 245-247 and networking, 73-75
Docker host on Google GCE, 239-241 container logs
Docker host on Microsoft Azure, 241-243 docker logs for obtaining, 279
Docker in, 231-272 managing Logspout routes to store, 285
Docker in GCE Google-container instances, using Elasticsearch and Kibana to store and
252-254 visualize, 287
Docker Machine to start Docker host in, using Logspout to collect, 282-285
container metrics
15-18
EC2 container service, 259-261
ECS cluster, 261-265
GCR to store Docker images, 250-252
Kubernetes via GCE, 254-258
cloud-init
configuring cloud instances, 238
starting container on CoreOS, 173
330 | Index