Copy large directory from Docker Image to host

When using Docker for development, it’s often necessary to bind mount directories from your host into your container. However, you’ll run into problems if your image contains files your host doesn’t (such as a node_modules directory). When that happens, you’ll need to get the files from your image into your host before you start your container with the mount.

Here’s the type of script you’ll end up with, if you are tl;dr prone.

rm -fr .tmp-node_modules.tar .tmp-docker-container-id node_modules
docker run --rm --cidfile .tmp-docker-container-id -d image-namespace/image-name:image-tag
xargs -I {} docker container cp -a "{}:/usr/src/app/node_modules" - < .tmp-docker-container-id > .tmp-node_modules.tar
tar -xf .tmp-node_modules.tar
rm .tmp-node_modules.tar .tmp-docker-container-id

This technique was developed so that I could give my team members a single, fast script they could run to get these files into their host. When we originally started working with Node.js in Docker I had a make dependencies command that would:

start the container with the mount
run NODE_ENV=development npm install
have the user wait for all their dependencies to install into their host (and wait they did)

Note, while I used a Makefile target, a bash script (or the like) works equally as well.

I didn’t like this for two reasons:

developers had literally just downloaded these exact same Node.js modules in their image build and waiting through it again is painful
there is always the chance that at the time of the script being ran, the dependencies downloaded could differ from those in the image, creating unpredictability

Docker image

For the purpose of this guide, we’ll pretend we have a docker image named “image-namespace/image-name:image-tag” based on the Official Node.js image and has its WORKDIR set to /usr/src/app, with node_modules being installed at /usr/src/app/node_modules.

Start a Docker container

The docker cp command gives us exactly what we need to copy the files we need from the Docker image to the host, except it requires a running container—it can’t copy directly from an image.

To work around this we’ll need to run our Docker image, and we’ll need to run it in detached mode -d, since we’ll need to run subsequent commands. We’ll also want to use --rm to ensure our container is removed when we stop it.

docker run --rm -d image-namespace/image-name:image-tag

So this is great, except we don’t know how to access this new container. We could do some fancy container listing and grepping, or we can make use of the --cidfile option to store the container ID. We’ll store the container ID in a file named .tmp-docker-container-id to use with our copy command.

docker run --rm --cidfile .tmp-docker-container-id -d image-namespace/image-name:image-tag

Copy directory from the container

Now that we have our container running, we can use the container ID from the .tmp-docker-container-id file to copy to our host.

We’ll use redirection to read the contents of .tmp-docker-container-id into the docker container cp command. Since we don’t want the container ID to be used at the end of the command we’ll need to use xargs replace-str option -I to insert it into a specific spot in our command (where the {} appears within the quotes):

xargs -I {} docker container cp "{}:/usr/src/app/node_modules" DESTINATION < .tmp-docker-container-id

For our destination, we could specify the location in the host, however I’ve ran into problems with exceeding inodes in a PHP project with lots and lots of files. Instead, I’ve found that making use of Docker’s output to stdout by using - as the destination lets me get around this issue. When docker container cp writes to stdout it sends the content in the tar archive format.

xargs -I {} docker container cp "{}:/usr/src/app/node_modules" - < .tmp-docker-container-id

Lastly, we’ll use redirection again, but this time to write the output from stdout to a tar file, named .tmp-node_modules.tar—we’ll throw in a -a option that copies all uid/gid information for good measure, since we’ll be mounting this back into the container.

xargs -I {} docker container cp -a "{}:/usr/src/app/node_modules" - < .tmp-docker-container-id > .tmp-node_modules.tar

Extract contents of archive

Now that we have a tar file containing our directory, we can extract it, using the -x (extract) and -f (use archive file) options.

tar -xf .tmp-node_modules.tar

Cleanup and complete script

We’ve written a couple temporary files .tmp-node_modules.tar and .tmp-docker-container-id that we should now cleanup at the end of our script with the rm command.

docker run --rm --cidfile .tmp-docker-container-id -d image-namespace/image-name:image-tag
xargs -I {} docker container cp -a "{}:/usr/src/app/node_modules" - < .tmp-docker-container-id > .tmp-node_modules.tar
tar -xf .tmp-node_modules.tar
rm .tmp-node_modules.tar .tmp-docker-container-id

We should also, at the start of our script, remove these files with the -f (force) option in case a previous execution ran into an error before cleaning up. To not run into problems extracting our node_modules directory with multiple executions of the command we should also remove any existing node_modules directory with the -f and -r (recursive) options. Giving us our completed script:

rm -fr .tmp-node_modules.tar .tmp-docker-container-id node_modules
docker run --rm --cidfile .tmp-docker-container-id -d image-namespace/image-name:image-tag
xargs -I {} docker container cp -a "{}:/usr/src/app/node_modules" - < .tmp-docker-container-id > .tmp-node_modules.tar
tar -xf .tmp-node_modules.tar
rm .tmp-node_modules.tar .tmp-docker-container-id