Home > Back-end >  Ansible downloading file from remote host without knowing the name
Ansible downloading file from remote host without knowing the name

Time:08-03

I am trying to download a huge file from the remote host, however I do not know the name and format of the file in the server that I am trying to download, I only have the URL.

  1. I need to preserve the downloaded file name.
  2. Prevented the repeated download to save time.
  3. Get the local file name, when the file is downloaded or skipped in both cases.

1st playbook: The problem is, the following code works fine and download the image to /tmp/images dir. However, it download the image every time. (take ~2 mins). How to prevent the repeated download ?

---
- hosts: localhost
  tasks:
  - name: "Download the Image"
    ansible.builtin.get_url:
      url: "https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img"
      dest: "/tmp/images/"
      mode: '0755'
    register: image_download_stats

   - name: "Print the downloaded image name"
     debug:
       msg: "{{ image_download_stats.dest|basename }}"

2nd playbook: The workaround code, I came up with:

---
- hosts: localhost
  tasks:
  - name: "Download the image"
    shell: wget --show-progress=off   --content-disposition -N https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img --force-directories -P /tmp/images/
    register: image_download

  - name: "Print the name of the image"
    debug:
      msg: "{% if image_download.stdout_lines |length > 0  %}{{ image_download.stdout_lines |regex_search('(?<=‘)(.*?)(?=’)')}}{%else%}{{image_download.stderr_lines |regex_search('(?<=‘)(.*?)(?=’)')}}{%endif%}"

If you will run the two type of the playbooks two times, you will notice the 2nd one is saving the time by not downloading the image and still returning the file name. Any suggestions on the 2nd playbook ? The 2nd playbook is heavily dependent on the heavy lifting of wget, not the ansible native way. Do ansible users/experts think its ok to use it? The 2nd one is working fine for me, but do you guys think there are some edge case where this method will fail ? Or is there is a way to make url module smarter ?

Note: I know the basename technique to extract the file name from the URL. However, sometimes my URL is not in standard format to conclude the file name. Cannot trust to get the last part of the URL separated by / character.

Edit: I tried the 2nd playbook with archlinux download and it did not work. Meaning, it is causing repeated download. So, no time saving. Any suggestion is welcome. Eg:

wget --show-progress=off   --content-disposition -N 'https://gitlab.archlinux.org/archlinux/arch-boxes/-/jobs/69793/artifacts/raw/output/Arch-Linux-x86_64-basic-20220721.69793.qcow2?inline=false' --force-directories -P /tmp/images

Should I give up the idea of skipping the download for time saving ?

CodePudding user response:

Both attributes checksum and dest as a path with filename are needed to avoid downloading of url (if the checksum matches). For example

  - name: "Download the Image"
    ansible.builtin.get_url:
      url: "https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img"
      dest: "/tmp/images/jammy-server-cloudimg-amd64.img"
      mode: '0755'
      checksum: "{{ lookup('file', '/tmp/images/jammy-server-cloudimg-amd64.md5') }}"
    register: image_download_stats

Quoting from checksum:

Additionally, if a checksum is passed to this parameter, and the file exists under the dest location, the destination_checksum would be calculated, and if checksum equals destination_checksum, the file download would be skipped ...

Quoting from dest:

If dest is a directory, the file will always be downloaded (regardless of the force and checksum option), but replaced only if the contents changed.


(Yes. The developer didn't parse the URL to get the filename. Hence, without the filename, the module can't calculate and compare the checksum if *dest* is a directory.)

See #73185


Q: "Download file from remote host without knowing the name."

A: Parse URL and create attributes dest and checksum. For example,

    - name: "Download the Image"
      ansible.builtin.get_url:
        url: "{{ my_url }}"
        dest: "{{ my_dest }}/{{ my_file }}"
        mode: '0755'
        checksum: "{{ lookup('file', my_checksum) }}"
      register: image_download_stats
      vars:
        my_dest: /tmp/images
        my_url: "https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img"
        my_file: "{{ my_url.split('/')|last }}"
        my_file_name: "{{ my_file|splitext|first }}"
        my_checksum: "{{ my_dest }}/{{ my_file_name }}.md5"
  • Related