Home > Back-end >  How to leave my training process running after I exit the container and ssh?
How to leave my training process running after I exit the container and ssh?

Time:09-14

I'm training a model on AWS, and my workflow is:

  1. Connect to the EC2 instance via ssh
  2. Start the Docker container if not running already
  3. docker exec -it <container_name> bash
  4. python train.py

I can use Ctrl Z to put the Python process in the background. However, I cannot exit the container shell, because the training process is attached to it. I assume it will also exit if I disconnect from ssh entirely (laptop shuts down, I close the terminal, etc.)

I thought that running python train.py & would fix it, but the training process is still stopped.

What's the best/most common way of accomplishing this?

CodePudding user response:

Your approach won't work, because when you exit the container shell the Python process will be killed.

You can either:

  1. Run docker exec with the -d detached mode:

    docker exec -it -d <container_name> python train.py 
    
  2. Configure your docker container to have the Python script as the entrypoint:

    ENTRYPOINT ["python", "train.py"]
    

    then you can docker run the container with the -d detached mode.

  • Related