Organising Docker projects
I’ve been toying with Docker for a little while now, and after my brief learning stint, I’ve begun to move all of my linux-based services to Docker containers. After overcoming some of the beginner pitfalls, I decided it was best to standardise on a single layout for all of my Docker projects.
Firstly, I ensure that that the containers are always defined and run via. a
docker-compose.yml, to allow my projects to be portable, recreatable and self-documenting. Secondly, I always follow a specific directory layout for the volumes that are attached (
Here is an example of a docker project directory:
. ├── docker-compose.yml ├── build [OPTIONAL] │ ├── Dockerfile │ └── ... └── volumes ├── service1 │ ├── etc/... │ ├── var/... │ └── ... └── service1 ├── etc/... ├── var/... └── ...
Now that I’ve decided on a standard layout, I can throw it into version control and I always know what to expect. But what about the volumes that are mapped to the containers?
This is where I started looking into git submodules. This useful feature called provides a way to embed related repositories into a parent repository (a ‘superproject’), all the while still keeping the contents and history isolated in a separate repository.
Despite the fact that our container configuration and our data are all required to launch our functioning container, they are essentially separate entities and can change without affecting each other.
For example, I have a local Docker project that launches a container running Hugo, the static site generator, so I can edit by blog locally before committing my atrocities to the internet. If I change the container configuration (Eg. adding a new volume), it wouldn’t affect the underlying blog content, and the same goes for editing the blog content; it doesn’t need to affect the overlaying container config.
By using submodules, I can have a
blog repository and a
docker-blog repository. The
blog repository just contains the content of my blog (and is what is used by Gitlab to build and host my blog) and the
docker-blog contains the Docker configuration as well as a reference to the
blog repository for the content. If I move to another machine, I simply have to run
git clone --recurse email@example.com:absolutejam/blog-docker and it will pull down the whole lot. Alternatively, I can omit the
--recurse part and I have just the container config, ready to create a new blog.
Setting up a submodule - A demonstration
Firstly, create two folders named
submodule, and make separate git repos for these folders locally and on your favourite hosted git service, GitLab.
Now, create some files in the
superproject ├── apple.md ├── banana.md └── pear.md
…And add some files to the
submodule ├── one.md └── two.md
Commit these files to the relevant repos.
Now, go back into the
superproject folder and add the
submodule repo we just created as a submodule.
git submodule add firstname.lastname@example.org:user/submodule.git submodule
If the repositories are stored on the same GitLab server, we can use a relative path (Uses HTTPS remote URL instead of Git).
For example, ../submodule.git if both repos are under the same account.
Or ../../some-other-user/submodule.git if the repo belongs to another user/group.
Now, we should have the following folder structure.
superproject ├── .gitmodules ├── apple.md ├── banana.md ├── pear.md └── submodule ├── one.md └── two.md
.gitmodules file has been created, which contains the metadata of the newly added submodule. And you can also see the
submodule directory contains the files we committed earlier.
Now that we have our submodule set up and pulled down, we need to commit both the
.gitmodules file as this defines the persistent links to the submodule remotes, and the
submodule folder. However, when we commit the
submodule folder we’re not actually committing the folder and its contents, we’re committing a gitlink to the currently active commit in the submodule.
This means that if we enter the submodule, checked out a different commit via.
git checkout <sha> then checked that in to the superproject, from herein out, the superproject would reference that version of the submodule.
Alternatively, if we want to get the latest commit (HEAD) of the submodule (defaults to
master branch), we can simply run
git submodule update --remote (or
git pull from within the submodule directory).
And as you can see, throughout all of this, the superproject only ever references the submodule as a complete entity (the current commit of the submodule), not the individual files within.
Changes in submodules
If anything has changed in the submodule and we run
git status from the superproject, we might notice that the submodule is dirty. The two changes you’ll most likely see for your submodule are
new commits and
new commits- The submodule’s SHA has changed, meaning that the commit the submodule is referencing is different to the one the superproject expects. This could be that it’s ahead, behind, on a different branch, etc.
untracked content- Files have been altered to the submodule and not committed yet.
While submodules are great, there are a few things to take into consideration - and they all stem from the fact that submodules point to a specific commit.
- The one that trips me up the most, is that submodules start in a detached HEAD state. This means that they’re not currently on a branch and we need to
git checkout <branch>before we commit any changes or we could risk them being lost upstream.
- There’s no way to always have the latest version of a submodule automatically. You must run
git submodule update --remoteor
git pullfrom within the submodule.
- This also means you’re going to have to commit the latest version of the submodule to the remote and commit this version in your superproject if you want your project to point to the latest version of the submodule.