Moving a Single Server App To Docker on AWS

Last fall, we moved the Qwaya Facebook ads tool from a single host on Rackspace to a clustered Docker based environment on AWS. This is the first blog post of hopefully more that I’m writing to remember what we did. I’ve planned two more posts, one on how we handle CI and one on container orchestration.

Docker Logo

Why Docker?

The task at hand was not “Move to docker”, it was “get out of single host hosting”. The old deployment consisted of some quite complicated Fabric scripts that was run from an arbitrary developers laptop. The source code repository did not match the deployment setup, so there was a lot of remote copy commands running around. I know that this is a very common way of deploying applications, but it brings a number of problems. For example, there are no build artifacts, making repeatable builds harder, and if the build fails halfway in you will have a broken system.

To get away from this we saw three options:

Build a local .tar.gz archive, copy that to the server and extract it.
Build a debian package, get that to the server - either by copying the .deb file directly to the server or by using a local deb repo.
Build a docker image and run it on the servers.

Alternative 1 still involves a lot of manually copying files around when extracting the archive on the server. It also requires a separate server configuration step setting up all the OS dependencies needed, which is usually quite a hassle when upgrading a package.

Using a .deb for deployment will declaratively take care of file copying, as well as stating the dependencies for your application. This is a big improvement on the .tar.gz solution, but we’re still dependent on Debian as a host distro.

Docker improves on the .deb alternative by providing encapsulation - whatever our application needs is declaratively specified, and can be run in the same form on development machines and servers alike as it is largely ignorant of the host distro it is run on. Where the .deb states the dependency declaratively, the docker container includes it in full.

Even though a young technology, Docker had passed 1.0 and we felt the momentum and usage was large enough to make us go for it.

Building the Docker Containers Using A Subset Of The Files

Building Docker containers is fairly simple using Dockerfiles. One problem we did have was that our repo has quite a lot of files like ext4 (don’t ask) build tools etc. These were numerous enough to choke the Docker daemon on build time.

To make this work we had to copy the files to go into a separate build directory along with the Dockerfile and build the Docker container from there. The Makefile looks like this:

$(DOCKER_BUILD): $(DJANGO_FILES) Dockerfile
    rm $(DOCKER_BUILD_DIR) -rf
    cp -r django $(DOCKER_BUILD_DIR)
    cp Dockerfile $(DOCKER_BUILD_DIR)
    cd $(DOCKER_BUILD_DIR) && docker build --rm -t $(IMAGE_NAME) .
    docker tag -f $(IMAGE_NAME) $(IMAGE_NAME):${BUILD_TAG}
    docker tag -f $(IMAGE_NAME) $(IMAGE_NAME):$(TIMESTAMP)
    echo $(IMAGE_NAME):$(TIMESTAMP) > $(DOCKER_BUILD)

Choosing CentOS As Host OS

We tried out a number of alternatives. CoreOS was a very attractive alternative, but we were concerned what would happen if the etcd cluster lost consensus. Also we felt that the actual deploy process with systemd service files was a bit immature at the time. I’ve heard from people I trust that CoreOS has evolved since then.

We finally decided to run our Docker containers on top of standard Linux hosts. After trying out a few distros we choose to go with Centos 7 for our Docker hosts. Given the choice today, we would probably have stayed with Debian, as Jessie’s kernel improvements and move to systemd makes it more suitable to Docker hosting that Wheezy.

The hosts are built with Packer as described in a previous post

Why Not Kubernetes, Deis etc.

First of all, these platforms weren’t as a viable alternative nine months ago as they are now, things move really fast in the Docker space.

Second, we wanted to limit the number of new things we introduced at the same time. Moving to Docker was a big step, and just running them as standard processes made the transition easier.

I have no doubt we will move to something more PAAS’y in the future, but for now this works fine for us.

Running Docker As systemd Services

Our old server ran Debian and managed processes with runit, so our initial approach was to use the same setup for docker. Unfortunately we ran into problems trying to get runit and docker play together.

Instead we looked to systemd. As an [Arch Linux][arch_linux] user I have been using systemd for quite a while now, and I have generally happy with it and the declarative way of specifying services. Docker works really well with systemd, and has some well documented examples over at the CoreOS site.

Services are declared as such:

[Unit]
Description=Dryleaf Backend
Requires=docker.service
After=docker.service

[Service]
Type=simple
User=qwaya-user
Group=qwaya-user
TimeoutStartSec=0
Restart=always
ExecStartPre=-/usr/bin/docker kill backend
ExecStartPre=-/usr/bin/docker rm -v backend
ExecStart=/usr/bin/docker run --name backend \
-p 8000:8000 \
-e "DB_HOST=<db_host>" \
-e "DB_USER=<db_user>" \
-e "DB_PWD=<yeah_right>" \
-e "DB_NAME=<db_name>" \
qwaya/qwaya-backend:latest
ExecStop=/usr/bin/docker stop backend

[Install]
WantedBy=multi-user.target

As can be seen, We’re still handling settings with environment variables, but in a next step I would really like to see us adapt something like vault and consul for service discovery and settings.

Summary

We’ve been running this setup with a total of 60 containers on six machines for four months now, and have had very few hiccups - we’ve had to restart a machine once, and twice force restart a process. All in all it’s a great improvement over our old setup.