Docker for Web Science

Containerization is one of the bigger movements that emerged in the last few years. Tools like Docker permit the rapid and easy composition of Web-based applications in much the same way as traditional scripting permitted the composition of traditional software. Increasingly, many applications used by Web Science researchers will be complemented with a Docker stack which removes much of the requirement for configuration. An example of this is the Google TensorFlow image, which allows a machine learning environment to be set up without any knowledge about the software package itself. In this tutorial we will introduce the tool Docker and the associated Docker Compose which facilitates containerization and deployment of any research environment.

A significant advantage of Docker for Web Science is that it makes deployment of a research environment far easier, even for someone who does not have much prior programming or infrastructure experience. This is because once software is “containerized”, it ships with its runtime environment. As a result, the required steps have already been performed, so difficulties surrounding configuration issues for different systems can be minimized. Any customisations of containerized software may also be stored explicitly, enabling easy and rapid redeployment of developed environments. This makes it an ideal approach for quickly setting up a research environment, as well setting up any publicly accessible product since less configuration is required between the two.

Our tutorial will demonstrate the advantages of using a microservices architecture, and how this can be accomplished using Docker and Docker Compose. We will introduce the most important features of the Docker technology for Web scientists, and give in-depth insight into how Docker can be used to implement certain common processes such as initial setup of a research environment, customisation, data management, or Continuous Integration for developing Web applications. All presentations will be illustrated with real-world examples from our experience in developing a complex Web application for the SlideWiki project.

All presentations will be followed by hands-on activities in which the participants exercise the use of the concepts explained in the presentation. We will provide participants with a Virtual Machine (VM) for this purpose.

By the end of the tutorial, participants should be able to do the following:

Install and run Docker and Docker Compose
Understand the benefits of containerization and pre-existing Docker stacks
Use a Docker container for a research or software development environment
Deploy multiple services simultaneously

Workshop Leaders

The people who will be leading this workshop are as follows:

Huw Fryer
Benjamin Wulff
Allan Third
Klaas Andries de Graaf

Schedule

Motivating Example

For the SlideWiki EU project, we needed to deliver a complex Web application with a wide variety of services. A design decision was made to use a microservice architecture. In this presentation, we will go into some detail as to why we elected to use a microservice architecture to ensure maximum scalability and ease of deployment. We will demonstrate SlideWiki, and explain how the use of microservices made our development process as well as our deployment easy, and explain in more depth the reasons why a Web Scientist might wish to use Docker.
What is Docker and why should we care?

Having established our use case, in this presentation we will introduce real world applications of using Docker for Web Science research, such as Jupyter notebooks, TensorFlow and Hadoop. Having established the motivation, we will introduce microservice architectures, and provide a brief history of Docker, and how that can provide a solution for those use cases.

Activity: Participants will install Docker and Docker Compose. Although we will not limit participants’ choice of machine, we will provide participants with an Ubuntu VM before the tutorial to provide a common environment for learning.

“Gotchas”. There are a few things to be wary of when deploying a microservices architecture. This brief presentation will warn participants of some common pitfalls before they do further activities. A “cheat sheet” will also be provided to participants.
Setting up a Docker Research Environment

One of the strengths of Docker is the ability to use a complex environment setup, without having to go over the different steps every time. This is made possible through the Docker Hub, which is the repository of available Docker images. In addition, we will introduce the Docker “run” operation, and the different options available for starting and managing containers. In particular, we will ensure that participants can run commands inside a container; view the logs of a container; and persist data, so that they are able to use it as a research environment.

Activity: Participants will download an image from the Jupyter notebook stacks, and set up a research environment. They will familiarise themselves with the layout of a Docker container and associated actions using the image they have created.
Break - 30 minutes
Customising for research requirements

Although pre-existing Docker images are powerful, these will inevitably need to be modified to satisfy individual use cases. A “Dockerfile” can be used to set up an image with a certain set of customisations, whilst also able to inherit from previous images, giving a clear record of modifications, which can be helpful in recording the provenance of research results. In this presentation, we introduce the most common commands, provide examples of how they work, and provide some of the SlideWiki Dockerfiles as examples.

Activity The participants will create a new image by extending the image that they used in the previous step, and redeploying their application with the new image. They will have the option to push it to the Docker Hub, although this is not required since they would need to register an account.
Managing Data

Having configured a research environment, it must be given access to the relevant data in a manner which persists across the creation and destruction of containers. There are several options available for managing data in Docker, such as mounting to the host, using a Docker volume, and using a “volume container”. This presentation will provide an overview of the options, give recommendations about the appropriate times to use them. We will demonstrate the way we manage data for SlideWiki.

Activity: The participants will learn how to manage (named) volumes with Docker and how they can access them from the host system. In a second exercise participants will setup a volume container used by two or more containers to share data.
Deploying a Microservice Infrastructure in Minutes

Docker separates concerns so that any individual container should only have one responsibility. Complex research setups often require the use of multiple different components working together to achieve a goal. The deployment of such complex choreographies can be done through the use of Docker Compose, which this presentation will demonstrate, using SlideWiki as an example of a complex application.

Activity Participants will deploy a local instance of SlideWiki on their VMs. This exercise shows how to use docker-compose to set up an infrastructure comprised of several microservices that access a shared MongoDB and a front-end based NodeJS. The focus will be on how to implement components of a production deployment such as logging, data backup and maintenance tools.
Future

This presentation will summarise the tutorial, and signpost participants towards more advanced topics such as Docker networks, and Docker Swarm.

Resources

Resources for the workshop will be added closer to the date.