Slow Ansible performance when using loop with large YAML var-CodePudding

Hello Developer Community!

I have been working on developing some Ansible playbooks to manage Citrix NetScaler configuration and would like to get some help about the following. I have the following data structure defined in a variable named nsapp_lb_server:

nsapp_lb_server:
    - name:                      "SRV-1"
      ipaddress:                 "10.102.102.1"
      comment:                   "Chewbacca"

    - name:                      "SRV-2"
      ipaddress:                 "10.102.102.2"
      comment:                   "C-3PO"

    - name:                      "SRV-3"
      ipaddress:                 "10.102.102.3"
      comment:                   "Obi-Wan Kenobi"
...

[  another 1200 item...]

and I have the follow task:

  - name: "Check variables (loop)"
    ansible.builtin.assert:
        that:
            - ( (item.name is defined) and (item.name | length > 0) )
            - ( (item.ipaddress is defined) and (item.ipaddress | ipaddr() == item.ipaddress) )
            - ( (item.comment | length > 0) if (item.comment is defined) else omit )
    loop: "{{ nsapp_lb_server }}"

My problem is that, when I have thousands of records in the nsapp_lb_server variable, the loop is incredibly slow. The task finishes in 30 minutes, which is a very long time... :-(

After some digging on the Internet, it seems, the issue is caused by Ansible "loop" function, so I would like to check if there are any other methods what I can use instead of loop.

Are there any alternatives of Ansible "loop" which can provide the same result (looping over the entries of the variable)? I was thinking about using json_query, but still do not know how to implement it in this specific case.

My environment:

$ ansible --version
ansible [core 2.12.6]
  config file = /home/ansible/.ansible.cfg
  configured module search path = ['/home/ansible/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ansible/ansible/lib/python3.9/site-packages/ansible
  ansible collection location = /home/ansible/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/ansible/ansible/bin/ansible
  python version = 3.9.7 (default, Sep 21 2021, 00:13:39) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)]
  jinja version = 3.0.3
  libyaml = True

Could anyone please point me to the right direction? I have been working on my code set for a very long time and after testing the code with large data, the code seems to be useless, because of the running time. I have also checked the hardware resource allocated to the VM where Ansible controller is running on, nothing problematic.

Many thanks in advance!

CodePudding user response：

Running this validation as thousands of individual tasks is very slow because it adds a lot of execution and callback overhead. You can instead do it in a single task, with the caveat that it will be harder to track down the invalid list item(s):

- hosts: localhost
  gather_facts: false
  vars:
    nsapp_lb_server: "{{ nsapp_lb_samples * 10000 }}"
    nsapp_lb_samples:
        - name:                      "SRV-1"
          ipaddress:                 "10.102.102.1"
          comment:                   "Chewbacca"
        - name:                      "SRV-2"
          ipaddress:                 "10.102.102.2"
          comment:                   "C-3PO"
        - name:                      "SRV-3"
          ipaddress:                 "10.102.102.3"
          comment:                   "Obi-Wan Kenobi"
  tasks:
    - assert:
        that:
          - nsapp_lb_server | rejectattr('name') | length == 0
          - (nsapp_lb_server | map(attribute='ipaddress') | map('ipaddr')) == (nsapp_lb_server | map(attribute='ipaddress'))
          - nsapp_lb_server | selectattr('comment', 'defined') | rejectattr('comment') | length == 0

This runs in ~5 seconds for the 30,000 test entries I fed it.

To make it easier to find the bad values without making the task extremely ugly, you can split it up into a series of tasks:

- hosts: localhost
  gather_facts: false
  vars:
    nsapp_lb_server: "{{ nsapp_lb_samples * 10000 }}"
    nsapp_lb_samples:
        - name:                      "SRV-1"
          ipaddress:                 "10.102.102.1"
          comment:                   "Chewbacca"
        - name:                      "SRV-2"
          ipaddress:                 "10.102.102.2"
          comment:                   "C-3PO"
        - name:                      "SRV-3"
          ipaddress:                 "10.102.102.3"
          comment:                   "Obi-Wan Kenobi"
  tasks:
    - name: Check for missing names
      assert:
        that: nsapp_lb_server | rejectattr('name', 'defined') | length == 0
        fail_msg: "Bad entries: {{ nsapp_lb_server | rejectattr('name', 'defined') }}"

    - name: Check for bad names
      assert:
        that: nsapp_lb_server | rejectattr('name') | length == 0
        fail_msg: "Bad entries: {{ nsapp_lb_server | rejectattr('name') }}"

    - name: Check for missing IP addresses
      assert:
        that: nsapp_lb_server | rejectattr('ipaddress', 'defined') | length == 0
        fail_msg: "Bad entries: {{ nsapp_lb_server | rejectattr('ipaddress', 'defined') }}"

    - name: Check for bad IP addresses
      assert:
        that: (nsapp_lb_server | map(attribute='ipaddress') | map('ipaddr')) == (nsapp_lb_server | map(attribute='ipaddress'))
        fail_msg: "Suspicious values: {{ nsapp_lb_server | map(attribute='ipaddress') | map('ipaddr') | symmetric_difference(nsapp_lb_server | map(attribute='ipaddress')) }}"

    - name: Check for bad comments
      assert:
        that: nsapp_lb_server | selectattr('comment', 'defined') | rejectattr('comment') | length == 0
        fail_msg: "Bad entries: {{ nsapp_lb_server | selectattr('comment', 'defined') | rejectattr('comment') }}"

This runs in ~12 seconds for the same list of 30,000 test entries.