Introduction

Link to GitHub with Dockerfile, requirements.txt, and the instructions

Docker enables you to run containers that package an application along with its dependencies. The primary issue Docker addresses is the common problem where code may not work on another computer. By isolating the application, Docker also minimizes the risk of causing issues on the host system.

Docker Host: The machine (either physical or virtual) where Docker is installed and runs.
Docker Guest: The containers managed by the Docker host, each providing a separate, isolated environment for running applications.

This instruction applies specifically to the Dockerfile I provided and may differ with other Dockerfiles. The provided Dockerfile is tailored for training and inference of Large Language Models (LLMs) using Hugging Face transformers, but it is not limited to this use case. The container includes Hugging Face transformers and PyTorch.

Installation

To install Docker on Ubuntu, follow these instructions:

If you succeeded, you can run docker run hello-world, and it will print a text starting with “Hello from Docker!”

To make Docker work with Nvidia GPUs, follow these instructions:

If you succeeded, you can run sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi, and it will print a table with GPUs.

Build the Image

To build the image, run the following command from the folder containing the Dockerfile:

docker build -t train_and_inference_llms .

If you need a newer version of a Docker image from Hugging Face or a new library version specified in requirements.txt, run:

docker build --no-cache -t train_and_inference_llms .

This command builds the image from scratch, ensuring the latest versions are included.

Start the Container

docker run --name train_and_inference_llms_container \
  --interactive --tty --detach --restart=unless-stopped \
  --runtime=nvidia --gpus all \
  --mount type=bind,source=/home/my_user/code,target=/code \
  train_and_inference_llms

train_and_inference_llms_container: Name of the container (you can choose any name you like).
/home/my_user/code: A folder on the host. /code is where this folder will be inside the guest. Changes made in /code on the guest are reflected in /home/my_user/code on the host, and vice versa.
You can mount other folders following the format of the previous point.
train_and_inference_llms: The name of the image (from the “Build the Image” section).

VS Code

Install the Docker extension by Microsoft.
To open a container in VS Code, use the Command Palette -> Dev Containers: Attach to Running Container…

Remember, the code will be located in the /code folder.

Open the Terminal of the Guest from the Host Terminal

docker attach train_and_inference_llms_container

Modify the Image

To modify the image, you can either:

Modify requirements.txt for Python library changes.
Modify the Dockerfile for Ubuntu package changes, e.g., add the line RUN apt-get install -y vim.

To use a container with the new image, you need to stop and remove the previous container.

To stop the container:

docker stop train_and_inference_llms_container

To remove the container:

docker rm train_and_inference_llms_container