COMPAS and DockerΒΆ

Docker has been added to COMPAS to reduce time and effort required to set up the COMPAS deployment environment.

Instead of having to install and configure several libraries and tools (e.g. python/pip, numpy, g++, boost) which can vary considerably beween operating systems and existing toolchains, users can instead opt to install Docker and run COMPAS with a single command.

This also gives users the ability to run COMPAS on cloud solutions like AWS EC2 or Google Compute Engine where hundreds of cores can be provisioned without having to manually configure the environment.

Docker works by creating an isolated and standalone environment known as a container. Containers can be created or destroyed without affecting the host machine or other containers*.

Containers are instances of images. An image is a pre-defined setup/environment that is instantiated when started as a container (containers are to images what objects are to classes). More here on the relationship between images and container.

Containers are (almost) always run as a Linux environment. A major benefit of this is the ability to run Linux applications in a Windows or MacOS environment without having to jump through hoops or have a diminished experience.

Image definitions can be defined by users (e.g. Dockerfiles); there are also standard images publicly available on Docker Hub

All that is required to start using COMPAS with Docker is the "Usage" section (the "CI/CD" section is also highly recommended). The other sections are provided for extra info.

* Containers can still interact with each other and the host machine through mounted directories/files or exposed ports.

Usage

N.B. This section assumes Docker has been installed and is running. For Windows and MacOS users, see here.

Installing

The latest compiled version of COMPAS (dev branch) can be retrieved by running docker pull teamcompas/compas

Other versions can be used by adding a version tag. For example, COMPAS version 2.12.0 would be teamcompas/compas:2.12.0. To see all available versions, go to the TeamCOMPAS docker hub page here.

Running

COMPAS can still be configured via command line arguments passed to the COMPAS executable or via a runSubmit.py file.

Running runSubmit.py

To run COMPAS via a runSubmit.py file, the command is a little more complex.

docker run                                                  \
--rm                                                    \
-it                                                     \
-v $(pwd)/compas-logs:/app/COMPAS/logs                  \
-v $(pwd)/runSubmit.py:/app/starts/runSubmit.py   \
-e COMPAS_EXECUTABLE_PATH=/app/COMPAS/bin/COMPAS        \
-e COMPAS_LOGS_OUTPUT_DIR_PATH=/app/COMPAS/logs         \
teamcompas/compas                                       \
python3 /app/starts/runSubmit.py

Breaking down this command:

docker run creates a container

--rm Clean up destroy the container once it finishes running the command

-it short for -i and -t - provides an interactive terminal

-v <path-on-host>:<path-in-container> Bind mounts mount <path-on-host> to <path-in-container> This time we not only want to get the output from COMPAS on the host machine, we also want to supply a runSubmit.py to the container from the host machine.

NOTE: if you decide to execute using runSubmit.py, you will need a compasConfigDefault.yaml file in the same directory. This file can be find in the same directory as the runSubmit.py, and contains the default COMPAS choices for stellar and binary physics. These choices can be changed by modifying the options availabe in the compasConfigDefault.yaml file.

-e VAR_NAME=value Environment variables set the environment variable VAR_VAME to value

teamcompas/compas the image to run

python3 /app/starts/runSubmit.py the command to run when the container starts

Run the COMPAS executable

To run the COMPAS executable directly (i.e. without runSubmit.py)

docker run                                  \
--rm                                    \
-it                                     \
-v $(pwd)/compas-logs:/app/COMPAS/logs  \
teamcompas/compas                       \
bin/COMPAS                              \
--number-of-binaries=5                  \
--outputPath=/app/COMPAS/logs

Breaking down this command:

docker run creates a container

--rm Clean up destroy the container once it finishes running the command

-it short for -i and -t - provides an interactive terminal

-v <path-on-host>:<path-in-container> Bind mounts mount <path-on-host> to <path-in-container> In this instance, make it so $(pwd)/compas-logs on my machine is the same as/app/COMPAS/logs` inside the container

teamcompas/compas the image to run

bin/COMPAS the command to run when the container starts

--number-of-binaries anything after the given start command is passed to that command, in this case, the flag to set the number of binaries

--outputPath /app/COMPAS/logs same as above, anthing after the start command is given to that start command, here it forces logs to go to the directory that is mapped to the host machine

More info on docker run here

NOTE 1:

Two new environment variables have been added, both of these apply to runSubmit.py only and are non-breaking changes.

COMPAS_EXECUTABLE_PATH is an addition to the default runSubmit.py that overrides where runSubmit.py looks for the compiled COMPAS. This override exists purely for ease-of-use from the command line.

COMPAS_LOGS_OUTPUT_DIR_PATH is also an addition to the default runSubmit.py that overrides where logs are placed. The override exists because the mounted directory (option -v) is created before COMPAS runs. COMPAS sees that the directory where it's supposed to put logs already exists, so it created a different (i.e. non-mapped) directory to deposit logs in.

NOTE 2:

The docker run ... examples above both use the -it options. If you want to run multiple instances of COMPAS, I would highly recommend using detached mode (-d) instead. All container output will be hidden.

An example where this would be useful is if you were running 4 instances of COMPAS at once. You could copy/paste the following into the terminal...

docker run --rm -d -v $(pwd)/compas-logs/run_0:/app/COMPAS/logs -v $(pwd)/runSubmitMMsolar_01.py:/app/starts/runSubmit.py teamcompas/compas python3 /app/starts/runSubmit.py &

docker run --rm -d -v $(pwd)/compas-logs/run_1:/app/COMPAS/logs -v $(pwd)/runSubmitMMsolar_02.py:/app/starts/runSubmit.py teamcompas/compas python3 /app/starts/runSubmit.py &

docker run --rm -d -v $(pwd)/compas-logs/run_2:/app/COMPAS/logs -v $(pwd)/runSubmitMMsolar_03.py:/app/starts/runSubmit.py teamcompas/compas python3 /app/starts/runSubmit.py &

docker run --rm -d -v $(pwd)/compas-logs/run_3:/app/COMPAS/logs -v $(pwd)/runSubmitMMsolar_04.py:/app/starts/runSubmit.py teamcompas/compas python3 /app/starts/runSubmit.py

...which would run 4 separate instances of COMPAS, each with its own runSubmit.py file and logging directory, and all console output supressed.

You may want to check the console output to see how far into the run COMPAS is. The command for this is docker logs <container_id>. You can get the container id by running docker ps.

CI/CD

The latest version of COMPAS (dev branch) is available at teamcompas/compas. This is provided automatically by CI/CD.

Whenever a push to TeamCOMPAS/dev a continuous deployment process automatically builds a new image and deploys it to DockerHub with a tag that corresponds to the value of VERSION_STRING in constants.h.

At time of writing, GitHub Actions is facilitating the above process. While this is convenient (because it's free and well supported) it is quite slow. I have plans to create a runner locally with a high core count that can be used to compile COMPAS quickly, but haven't gotten around to it yet.

You can realistically expect the latest COMPAS docker image to be available 5 - 10 minutes after pushing/merging.

The Github Actions configuration is in /.github/workflows/dockerhub-ci.yml.

Atlassian has a good writeup about what CI/CD is.

Bonus Info

Dockerfile

The Dockerfile defines how the docker image is constructed.

Images are created as a combination of layers. During the build process each layer is cached and only updated on subsequent builds if that layer would change.

The Dockerfile for COMPAS is made up of 8 layers.

FROM ubuntu:18.04 Use Ubuntu 18.04 as a base (provided by Docker Hub) https://docs.docker.com/engine/reference/builder/#from docs

WORKDIR /app/COMPAS Effectively cd /app/COMPAS within the container. WORKDIR docs

RUN apt-get update && apt-get install -y ... Install the required dependencies. -y so there's no prompt to install any of the packages. update and install are in the same layer because now if there are any updates, it will force all of the dependencies to be re-installed RUN docs

RUN pip3 install numpy Install numpy. RUN docs

COPY src/ src/ Copy ./src/ directory from the local machine to ./src in the container (remembering that WORKDIR changes the cwd). COPY docs

RUN mkdir obj bin logs Create the directories required by COMPAS. RUN docs

ENV COMPAS_ROOT_DIR /app/COMPAS Set the required environment variable(s). ENV docs

RUN cd src && make -f Makefile.docker -j $(nproc) Make COMPAS using a specific makefile (more below) and as many cores as possible. RUN docs

Dockerfiles will usually end with a CMD directive that specifies what command should run when the container is started. COMPAS doesn't have a CMD directive because some users will want to run the executable directly and some will want to use runSubmit.. CMD docs

Makefile.docker

A separate makefile is required for Docker in this scenario for two reasons.

  1. To separate compiled files from source files

  2. To prevent the usage of -march=native

-march=native is a fantastic optimisation for users who compile and run COMPAS on the same machine, however it causes fatal errors when running COMPAS on a machine that it was not compiled for. Docs for -march.

This selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine.

Using -march=native enables all instruction subsets supported by the local machine (hence the result might not run on different machines).