Dockerizing a rails app part I

As a consultant it's vital to be able to switch contexts between different application environments seamlessly to maximize productive time. In the age of Docker this has, thankfully, become easier that ever.

Dockerizing a rails app part I

As a consultant it's vital to be able to switch contexts between different application environments seamlessly to maximize productive time. In the age of Docker this has, thankfully, become easier that ever.

Part I of this series will involve creating a dockerfile for a rails application. This is the essential first step in dockerizing the whole application stack.

The Application Dockerfile

The rails stack I'm dealing with here is reasonably straightforward:

  • Rails 4.2.x using the asset pipeline
  • Mysql
  • Sphinx (full text search).

Let's look at the Dockerfile layer by layer:

1 - Base image and preliminary setup

FROM seocahill/ruby:2.3.1-alpine3.6
LABEL author="seo.cahill@gmail.com"
RUN mkdir -p /app
WORKDIR /app

The base image uses alpine linux which is a very small distribution (~4mb).

Generally your dockerfile should use an official ruby image (ruby:alpine for example) but in this case I required an older version of ruby and the latest version of alpine, 3.6, so I'm using my own custom image.

The WORKDIR command means all following docker commands will be executed in the specified folder at build and runtime. If the folder doesn't exist, the WORKDIR command creates it.

2 - Install application environment dependencies

RUN \
  apk update && apk add --no-cache \
  less \
  ncurses \
  nodejs \
  openssh \
  mysql-client \
  mysql-dev \
  sphinx \
  tzdata \
  && mkdir -p \
  /var/lib/sphinx \
  /var/lib/sphinx/data \
  /var/log/sphinx \
  /var/run/sphinx

The RUN command is the equivalent of bash -c args and apk is equivalent of apt-get in a standard linux distro.

The format of this command might appear a little ungainly at first but it is guided by docker best practices both in terms of readability and optimizing caching. I will discuss caching after I finish the Dockerfile walk-through.

Of the non-obvious library installations here, less and ncurses are required for the pry console to work correctly in alpine linux, tzdata is required by activerecord for datetime operations and openssh is required by capistrano.

In general the official ruby alpine image should cover most of the required dependencies out of the box for running a ruby app.

3 - Install build dependencies and gems

COPY Gemfile Gemfile.lock ./

RUN \
  apk add --no-cache --virtual build-deps \
  build-base \
  && gem install bundler \
  && bundle install --jobs 20 --retry 5 \
  && apk del build-deps

The COPY command works as you might expect copying files from the source directory at buildtime. The last argument is the path to copy to inside the image relative to WORKDIR if defined, or else the current or root directory.

Build dependencies are added un-cached and then deleted as soon as the app's rubygems have been installed.

The build-base library is equivalent to build-essentials and is required for compiling native code for gems like Nokogiri.

The --virtual flag is used to reference multiple build dependencies and is often used to remove them after the build is complete. The label used here, "build-deps", is purely arbitrary.

As we have only one build dependency there is no need to use the virtual flag but you will see this convention a lot in alpine based dockerfiles so I left it in for illustrative purposes.

4 - Source code and command


COPY . ./

EXPOSE 3000

CMD [ "bin/rails", "server", "-b", "0.0.0.0" ]

In many Dockerfiles you will see ENTRYPOINT defined as well as CMD.

In general the entrypoint will reference a binary or a script that performs some initialization and the cmd, if defined, will encompass default arguments for the entrypoint.

That doesn't fit our use case here as we will be instantiating the image using multiple different binaries depending on the context e.g. bin/rails, bin/rake, bin/spring etc.

The EXPOSE command dictates which port/s a container will listen on for connections by default.

CMD is the default command executed by a container if one isn't specified at runtime.

Docker image caching

A docker image is split into different layers to aid caching and to reduce build times.

An image is built from top to bottom and when one layer's cache is busted all subsequent layers are rebuilt from scratch.

Each command constitutes a new layer and different commands have different methods of figuring out whether to bust the cache or not.

The COPY command creates a checksum from the contents of the files it copies while the RUN command simply checks the command string passed to it. If no matching values are found in the cache, the current layer and all following it are rebuilt from scratch.

This is why you should never run apk update or apt-get update commands on there own. If you do your app's dependencies will be updated only once and the cache will be used thereafter as the command string never changes.

So how do the aforementioned caching considerations affect the structure of our Dockerfile?

The reason for the long shell commands in our Dockerfile is to minimize the cached layers - the more layers the bigger the image size.

The order of our Dockerfile is also important. We don't want to have to install all our application environment dependencies each time we run bundle install therefore we install them in a layer before we install the app's rubygems.

We do want to bust the cache each time we change the gemfile, but not each time we change a line of appliction code. Therefore we copy the gemfile separately (before we copy the application code) and run the gem build commands afterwards so it will be rerun if the gemfile changes.

You can read more about Docker image caching conventions in the official documentation.

Wrap up and part II

In part II we'll be setting up the test environment.