2022: docker push fails after multiple retries on a specific layer-CodePudding

So stuck again unable to push an image fully to docker hub. I've experienced this in the past, and having it on a loop eventually succeeds, but it's a very annoying process when need to share an image in a timely manner to students.

docker login succeeds
docker push has some layers done, some done after a few retries, and some are existing already so are reused
then there is always this one layer (each time not necessarily the same when a new image is built) that gets stuck in a retry loop

It looks like something like this:

The push refers to repository [docker.io/s4lab/gipsy-json-u18]
e17debe53e58: Pushing [==================================================>]  239.7MB/239.7MB
ebe677197a5c: Layer already exists 
962812a24e35: Layer already exists 
37850ad767c2: Layer already exists 
0269df6e94f5: Layer already exists 
e722d396f503: Layer already exists 
write tcp 10.0.2.15:52930->34.205.13.154:443: write: connection reset by peer
The push refers to repository [docker.io/s4lab/gipsy-json-u18]
e17debe53e58: Retrying in 7 seconds 
ebe677197a5c: Layer already exists 
962812a24e35: Layer already exists 
37850ad767c2: Layer already exists 
0269df6e94f5: Layer already exists 
e722d396f503: Layer already exists

Each attempt the destination IP address/end point changes.

In the log it's always something along these lines:

Oct 07 17:56:02 ub18 dockerd[1254]: time="2022-10-07T17:56:02.418642397-04:00" level=info msg="Attempting next endpoint for push after error: write tcp 10.0.2.15:57000->3.216.34.172:443: use of closed network connection"
Oct 07 17:56:03 ub18 dockerd[1254]: time="2022-10-07T17:56:03.550102562-04:00" level=error msg="Upload failed, retrying: write tcp 10.0.2.15:57026->3.216.34.172:443: use of closed network connection"
Oct 07 17:56:09 ub18 dockerd[1254]: time="2022-10-07T17:56:09.259048555-04:00" level=error msg="Upload failed, retrying: write tcp 10.0.2.15:52918->34.205.13.154:443: use of closed network connection"
Oct 07 17:56:27 ub18 dockerd[1254]: time="2022-10-07T17:56:27.499017532-04:00" level=error msg="Upload failed, retrying: write tcp 10.0.2.15:52922->34.205.13.154:443: use of closed network connection"
Oct 07 17:56:48 ub18 dockerd[1254]: time="2022-10-07T17:56:48.198701189-04:00" level=error msg="Upload failed, retrying: write tcp 10.0.2.15:52926->34.205.13.154:443: use of closed network connection"
Oct 07 17:57:12 ub18 dockerd[1254]: time="2022-10-07T17:57:12.688987070-04:00" level=error msg="Upload failed: write tcp 10.0.2.15:52930->34.205.13.154:443: write: connection reset by peer"
Oct 07 17:57:12 ub18 dockerd[1254]: time="2022-10-07T17:57:12.693009463-04:00" level=info msg="Attempting next endpoint for push after error: write tcp 10.0.2.15:52930->34.205.13.154:443: write: connection reset by peer"
Oct 07 17:57:14 ub18 dockerd[1254]: time="2022-10-07T17:57:14.146027151-04:00" level=error msg="Upload failed, retrying: write tcp 10.0.2.15:37338->44.205.64.79:443: write: broken pipe"
Oct 07 17:57:20 ub18 dockerd[1254]: time="2022-10-07T17:57:20.589127385-04:00" level=error msg="Upload failed, retrying: write tcp 10.0.2.15:37344->44.205.64.79:443: use of closed network connection"

This is on:

# uname -a
Linux ub18 5.4.0-70-generic #78~18.04.1-Ubuntu SMP Sat Mar 20 14:10:07 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# docker version
Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.8
 Git commit:        20.10.7-0ubuntu5~18.04.3
 Built:             Mon Nov  1 01:04:14 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.8
  Git commit:       20.10.7-0ubuntu5~18.04.3
  Built:            Fri Oct 22 00:57:37 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.5-0ubuntu3~18.04.2
  GitCommit:        
 runc:
  Version:          1.0.1-0ubuntu2~18.04.1
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:

These related questions/answers do not help:

Docker push fails after multiple retries on a specific layer
Docker push, repeats pushing a layer for ever
different issues over GitHub and elsewhere talking about private and ECR repos, etc. not really applicable

So what is a more reliable / robust solution in 2022 to get past this problem? It seems arbitrary and hard to troubleshoot unless I am missing something. This is especially annoying as I was able to push without issues in the past from the same environment.

CodePudding user response：

This appears to be network related. Something on your network is corrupting or dropping the connection. My assumption from that comes from the regctl debug logs showing that it switched to a chunked upload with patch requests. That only happens when the connection is dropped on the normal push. The lack of other errors rules out anything corrupting the digest on your host.

The default chunk size in regctl is 1MB. That results in a lot of connections for large images that will be slow, but more reliable on flaky networks. You can adjust the chunk size, to improve the speed, with a registry set command, e.g.:

regctl registry set --blob-chunk 20971520 --blob-max 104857600 docker.io

makes the chunk size 20MB, and automatically uses a chunked upload instead of trying a normal push for any layer over 100MB, when pushing to Docker Hub (docker.io).

Disclaimer: I'm the author of regclient/regctl.