Fluentd.
Logstash is built by Elastic and is well integrated with Elasticsearch and Kibana. It has lots of plugins. Fluentd describes itself as an open source data collector for unified logging layer. Docker provides a driver to push logs directly into Fluentd. Fluentd also has a lot of plugins like one to connect to Elasticsearch.
I've chosen Fluentd because Docker pushes it, and Kubernetes (an important Docker project) uses it. Furthermore, in our example, Fluentd Elasticsearch's plugin plays well with Kibana.
We can use two types of infrastructure: either a classic architecture with servers or a cloud-based one. I've chosen the classic one for simplicity's sake.
Therefore two servers are needed, one for our application and Fluentd, and one for our Elasticsearch database and Kibana.
The process can be described in 6 steps
As the application is a simple nginx, I've packaged a new image since the official one uses a custom logger that is not appropriate for our purpose. We can run the app using Docker-compose up
with the following Configuration
$ cat ./Docker-compose.yml
nginx:
image: thibautgery/Docker-nginx
ports:
- 8080:80
Fluentd is a middleware to collect logs which flow through steps and are identified with tags. Here is a simple configuration with two steps to receive logs through HTTP and print them to stdout:
$ cat ./fluentd/fluentd.sample
<source>
@type http
port 8888
</source>
<match myapp.access>
@type stdout
</match>
In this sample, each step defines how the data is processed:
The data is streamed through Fluentd. Each chunk of data is tagged with a label which is used to route the data between the different steps.
In the previous example, the tag is specified after the key match
: app.access
. The tag of the incoming data is the URL of the request. For example running curl http://localhost:8888/myapp.access?json={"event":"data"}
outputs {"event":"data"}
to stdout.
This slideshare explains the basics of Fluentd.
Each step is a plugin. There are more than 150 plugins divided into 6 categories. The most important ones are:
First of all, the Fluentd agent can run anywhere, but for simplicity's sake we run it on the same node as the application. The official image can be found on the Docker hub.
We need to configure the Fluentd agent :
$ cat ./conf/Fluentd
<source>
@type forward
</source>
<match nginx.Docker.**>
@type stdout
</match>
Fluentd accepts connections on the 24224 port and prints logs on stdout thanks to two default plugins in_forward and out_stdout
We can run Fluentd with Docker-compose -f Docker-compose-fluentd.yml up
$ cat ./Docker-compose-fluentd.yml
Fluentd:
image: fluent/Fluentd
restart: always
ports:
- 24224:24224
volumes:
- ./conf:/Fluentd/etc
The default logging format option for Docker is json-file. We can use the log-driver option to specify Fluentd. By default it connects to localhost
on the 24224
port.
$ cat ./Docker-compose.yml
nginx:
image: thibautgery/Docker-nginx
ports:
- 8080:80
log_driver: Fluentd
log_opt:
Fluentd-tag: "nginx.Docker.{{.Name }}"
The Docker driver uses a default tag for Fluentd: Docker._container-id_
. We override it to be nginx.Docker._container-name_
with the log_opt
, Fluentd-tag: "nginx.Docker.{{.Name }}"
. The tag in the Docker driver must match the one in Fluentd. We should be able to see the Nginx logs in Fluentd container log.
Right now, our system is useless. We need to send logs to a distant database, Elasticsearch.
Fluentd needs the fluent-plugin-elasticsearch in order to send data to Elasticsearch. I have packaged the image here
We need to update the Fluentd agent configuration
$ cat ./conf/Fluentd
<source>
@type forward
</source>
<match nginx.Docker.**>
type elasticsearch
hosts http://elasticsearch.host.com:9200
logstash_format true
</match>
Don't forget to change the hosts to point to the Elasticsearch instance.
The logstash_format true
configuration is meant to write data into ElasticSearch in a Logstash compliant format, hence allowing the leveraging of Kibana.
We can run Fluentd with:
$ cat ./Docker-compose-Fluentd.yml
Fluentd:
image: thibautgery/fluent.d-es
ports:
- 24224:24224
volumes:
- ./conf:/Fluentd/etc
Then we can run the application and query it with our favorite browser to fetch some lines form Elasticsearch in the Logstash index. Since Fluentd buffers the data before sending them in batches, we might have to wait a minute or two.
Unfortunately, only the Docker metadata are sent (like the Docker name, label, id...) but the log
field contains the raw log lines of nginx and it is not structured. For example, we cannot query all failed HTTP requests (status code >= 400)
This line of log need to be parsed.
Fluentd needs the fluent-plugin-parser in order to format a specific field a second time. I have packaged the image with it here
We need to update the configuration :
$ cat ./conf/Fluentd
<source>
@type forward
</source>
<match nginx.docker.**>
type parser
key_name log
format nginx
remove_prefix nginx
reserve_data yes
</match>
<match docker.**>
type elasticsearch
hosts http://elasticsearch.host.com:9200
logstash_format true
</match>
The second block of configuration :
log
fieldnginx
on the tag nginx.docker.**
docker.**
Here we use the tag concept to route the data through the correct steps:
nginx.docker._container-name_
docker._container-name_
Here is the structured data we can now used to create diagrams :
We can then run the application and query it with our favorite browser to see the data correctly formatted in Kibana.
Our system collects logs from our application and send them to Elasticsearch. The Docker engine requires to have Fluentd up and running to start our container.
Even if Fluentd dies, our containers using Fluentd continues to work properly. Furthermore, if Fluentd stops for short periods of time, we do not lose any piece of log because the Docker engine buffers unsent messages, so that they can be sent later when Fluentd is back online.
Finally, since Docker 1.9 we can show labels and environment variables with the logging driver of Docker. In our example, we added: service: nginx
and it shows up in Kibana.
We are now able to create graphs such as this one:
So far, we have seen how to collect and structure logs from Docker to push them in Elasticsearch. We can easily change the Elasticsearch plugin to the Mongo or HDFS plugin and push logs to the database of our choice. We can also add an alerting system like Zabbix
We can add nodes to our infrastructure and add several containers in one node. Keep in mind that this article doesn't cover everything. For instance we have not answered the following questions :
Run everything in two commands with the Ansible scripted repository