I’ve been working on an EFK stack for the past couple days, and figured I’d set up Fluentd on my Swarm to forward the log to the aggregator.
Of course, I didn’t want to install it on the hosts themselves, as I already had Docker up and running; why bother with a manual installation?
The goal was to get my Swarm’s Traefik containers logs to Elasticsearch, and Docker have a Fluentd driver built-in. Perfect! Just need to redirect Traefik’s access logs (which are configured to go to stdout, and are then picked up by Docker) to a Fluentd container, and they’re gone.
Matter is, to do that Docker needs to be able to access Fluentd, somehow. I did not really want to use a TCP socket, as I did not especially need it, nor did I want to spend too much time trying to secure it and avoid accidental internet exposure.
So I figured that, if I’m going to run Fluentd on every nodes, why not simply expose an Unix socket through a bind volume and call it a day?
… and that’s exactly what this stack file does:
version: "3.7"
services:
fluentd:
image: fluent/fluentd:v1.10.4-1.0
user: root
volumes:
- type: bind
source: /var/run/fluentd-docker/
target: /var/run
configs:
- source: fluent-forwarder
target: /fluentd/etc/fluent.conf
deploy:
mode: global
restart_policy:
condition: on-failure
configs:
fluent-forwarder:
file: ./fluent-forwarder.conf
As you can see, I’m forcing the Fluentd container to run as root. This is because I am unsure as to how to secure the socket properly: I do not know if changing the ownership of /var/run/fluentd-docker/
to Fluentd’s default UID is a good idea or not. For the time being, I decided to roll with the root solution, as the container is (no longer) directly exposed to the internet. It is quite easy to change later anyway.
It has to be coupled with a Fluentd configuration that I named fluent-forwarder.conf
. Here’s a template I use for Traefik logging purposes:
<source>
@type unix
path /var/run/td-agent.sock
</source>
<filter docker.swarm.traefik>
@type parser
key_name log
<parse>
@type json
time_type string
</parse>
</filter>
<match docker.*.*>
@type forward
transport tls
<server>
host <collector.example.com>
port <24224>
</server>
<security>
self_hostname <swarm.example.com>
shared_key <bigkey>
</security>
</match>
To boot up a stack using these, you’ll need to:
- Get these two files in the same directory (
fluentd-stack.yml
andfluent-forwarder.conf
) - Edit the Fluentd config according to your environment (particularly the values inbetween
<>
) - Create the
/var/run/fluentd-docker/
directory manually on your hosts (somewhat annoying, I know. For some reason, Docker doesn’t create bind mount points automatically when using them on a Swarm) - Deploy the stack (i.e.
docker deploy -c fluentd-stack.yml fluentd
on a manager node)
Once your stack is up and running, you can check that a new socket is available on each nodes running Fluentd in /var/run/fluent-docker/td-agent.sock
.
You will then need to tell Docker to redirect the logs of containers that interest you to that Unix socket.
To do that, all you need to do is adding a logging
part in your stack’s yml:
logging:
driver: fluentd
options:
tag: docker.swarm.traefik
fluentd-address: unix:///var/run/fluentd-docker/td-agent.sock
fluentd-async-connect: "true"
Change the tag
to your liking (and to stay consistent with your Fluentd filter configuration), and you’re good to deploy.
For those that may be interested, here’s my current (as of writing) full Traefik stack file:
version: "3.7"
services:
traefik:
image: traefik:v2.2.1
environment:
- "TZ=Europe/Paris"
command:
- "--entrypoints.web.address=:80"
- "--entrypoints.web-secure.address=:443"
- "--providers.docker.endpoint=tcp://socketproxy:2375"
- "--certificatesResolvers.le-ssl.acme.tlsChallenge=true"
- "--certificatesResolvers.le-ssl.acme.email=-snip-"
- "--certificatesResolvers.le-ssl.acme.storage=/etc/traefik/acme.json"
- "--log.format=json"
- "--accesslog=true"
- "--accesslog.format=json"
- "--providers.docker.swarmMode=true"
- "--providers.docker.exposedbydefault=false"
- "--providers.docker.swarmModeRefreshSeconds=5"
- "--providers.docker.network=traefik-net"
volumes:
- type: volume
source: traefik-data
target: /etc/traefik
volume:
nocopy: true
networks:
- traefik-net
- socket-private
ports:
- target: 80
published: 80
protocol: tcp
mode: host
- target: 443
published: 443
protocol: tcp
mode: host
logging:
driver: fluentd
options:
tag: docker.swarm.traefik
fluentd-address: unix:///var/run/fluentd-docker/td-agent.sock
fluentd-async-connect: "true"
deploy:
mode: global
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: any
socketproxy:
image: tecnativa/docker-socket-proxy:latest
networks:
- socket-private
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
NETWORKS: 1
SERVICES: 1
TASKS: 1
deploy:
mode: global
placement:
constraints:
- node.role == manager
restart_policy:
condition: any
volumes:
traefik-data:
driver_opts:
type: nfs
o: "-snip-"
device: ":-snip-"
networks:
traefik-net:
external: true
socket-private:
driver: overlay
internal: true
I’m using a NFS share for shared storage and a socket proxy to run Traefik on each of my nodes, but those aren’t mandatory for Fluentd, of course.
And that’s it for Fluentd on a Swarm. I may write another post on making an aggregator for Elastic someday. In the meantime, happy logging!