Using Git submodules for Docker projects (and more!) |

December 29, 2016

Using Git submodules for Docker projects (and more!)

Organising Docker projects

I’ve been toying with Docker for a little while now, and after my brief learning stint, I’ve begun to move all of my linux-based services to Docker containers. After overcoming some of the beginner pitfalls, I decided it was best to standardise on a single layout for all of my Docker projects.

Firstly, I ensure that that the containers are always defined and run via. a docker-compose.yml, to allow my projects to be portable, recreatable and self-documenting. Secondly, I always follow a specific directory layout for the volumes that are attached (./volumes/<servicename>/<location/on/container/filesystem>).

Here is an example of a docker project directory:

├── docker-compose.yml
├── build [OPTIONAL]
│   ├── Dockerfile
│   └── ...
└── volumes
    ├── service1
    │   ├── etc/...
    │   ├── var/...
    │   └── ...
    └── service1
        ├── etc/...
        ├── var/...
        └── ...

Now that I’ve decided on a standard layout, I can throw it into version control and I always know what to expect. But what about the volumes that are mapped to the containers?

This is where I started looking into git submodules. This useful feature called provides a way to embed related repositories into a parent repository (a ‘superproject’), all the while still keeping the contents and history isolated in a separate repository.

The theory

Despite the fact that our container configuration and our data are all required to launch our functioning container, they are essentially separate entities and can change without affecting each other.

For example, I have a local Docker project that launches a container running Hugo, the static site generator, so I can edit by blog locally before committing my atrocities to the internet. If I change the container configuration (Eg. adding a new volume), it wouldn’t affect the underlying blog content, and the same goes for editing the blog content; it doesn’t need to affect the overlaying container config.

By using submodules, I can have a blog repository and a docker-blog repository. The blog repository just contains the content of my blog (and is what is used by Gitlab to build and host my blog) and the docker-blog contains the Docker configuration as well as a reference to the blog repository for the content. If I move to another machine, I simply have to run git clone --recurse and it will pull down the whole lot. Alternatively, I can omit the --recurse part and I have just the container config, ready to create a new blog.

Setting up a submodule - A demonstration

Firstly, create two folders named superproject & submodule, and make separate git repos for these folders locally and on your favourite hosted git service, GitLab.

Now, create some files in the superproject folder.


…And add some files to the submodule folder.


Commit these files to the relevant repos.

Now, go back into the superproject folder and add the submodule repo we just created as a submodule.

git submodule add submodule

If the repositories are stored on the same GitLab server, we can use a relative path (Uses HTTPS remote URL instead of Git).
For example, ../submodule.git if both repos are under the same account.
Or ../../some-other-user/submodule.git if the repo belongs to another user/group.

Now, we should have the following folder structure.

├── .gitmodules
└── submodule

The .gitmodules file has been created, which contains the metadata of the newly added submodule. And you can also see the submodule directory contains the files we committed earlier.


Now that we have our submodule set up and pulled down, we need to commit both the .gitmodules file as this defines the persistent links to the submodule remotes, and the submodule folder. However, when we commit the submodule folder we’re not actually committing the folder and its contents, we’re committing a gitlink to the currently active commit in the submodule.

This means that if we enter the submodule, checked out a different commit via. git checkout <sha> then checked that in to the superproject, from herein out, the superproject would reference that version of the submodule.

Alternatively, if we want to get the latest commit (HEAD) of the submodule (defaults to master branch), we can simply run git submodule update --remote (or git pull from within the submodule directory).

And as you can see, throughout all of this, the superproject only ever references the submodule as a complete entity (the current commit of the submodule), not the individual files within.

Changes in submodules

If anything has changed in the submodule and we run git status from the superproject, we might notice that the submodule is dirty. The two changes you’ll most likely see for your submodule are new commits and untracked content.

  • new commits - The submodule’s SHA has changed, meaning that the commit the submodule is referencing is different to the one the superproject expects. This could be that it’s ahead, behind, on a different branch, etc.
  • untracked content - Files have been altered to the submodule and not committed yet.


While submodules are great, there are a few things to take into consideration - and they all stem from the fact that submodules point to a specific commit.

  • The one that trips me up the most, is that submodules start in a detached HEAD state. This means that they’re not currently on a branch and we need to git checkout <branch> before we commit any changes or we could risk them being lost upstream.
  • There’s no way to always have the latest version of a submodule automatically. You must run git submodule update --remote or git pull from within the submodule.
  • This also means you’re going to have to commit the latest version of the submodule to the remote and commit this version in your superproject if you want your project to point to the latest version of the submodule.

© James Booth 2017