Home > Software design >  Looping lists of lists to count the appearance of each pair of elements (in son lists)
Looping lists of lists to count the appearance of each pair of elements (in son lists)

Time:11-14

Two lists of lists as below, and it needs to find out how many times (count) the pair of each elements in the son lists.

For example, William_Delta appear 4 times.

The result is to be written into a txt file.

processes = [['Iota', 'Gamma', 'Kappa'], ['Delta', 'Zeta', 'Beta'], ['Alpha', 'Zeta'], ['Alpha', 'Epsilon', 'Delta', 'Beta']]
staffs = [['William', 'James', 'Noah', 'Oliver'], ['Benjamin', 'Oliver', 'William'],['Oliver', 'Benjamin']]


list_output = []

for each_p in processes:
    for p in each_p:
        for each_s in staffs:
            for s in each_s:
                output = s   '_'   p
                list_output.append(output)

uniques = set(list_output)

with open('c:\\temp\\outfile.txt', 'a') as outfile:

  for ox in uniques:
    outfile.write(ox   '@'   str(list_output.count(ox))   "\n")

The lengths of both 'processes' and 'staffs' are very long so it takes much time to complete.

What's the better way to make the run shorter?

Thank you.

CodePudding user response:

Use collections.Counter to count each time an element appears in each sublist, then use itertools.product to find all the pairs. In the end the total count is the multiplication of each count. For example, "William" appears 2 times and "Delta" appears 2 times, therefore total count of the pair "William_Delta" is 4 (2 * 2).

from collections import Counter
from itertools import product

processes = [['Iota', 'Gamma', 'Kappa'], ['Delta', 'Zeta', 'Beta'], ['Alpha', 'Zeta'], ['Alpha', 'Epsilon', 'Delta', 'Beta']]
staffs = [['William', 'James', 'Noah', 'Oliver'], ['Benjamin', 'Oliver', 'William'],['Oliver', 'Benjamin']]


count_staffs = Counter(st for staff in staffs for st in staff)
count_processes = Counter(pr for process in processes for pr in process)

with open('outfile.txt', 'a') as outfile:
    for (staff, cs), (process, cp) in product(count_staffs.items(), count_processes.items()):
        outfile.write(f"{staff}_{process}@{cs * cp}\n")

This solution should be faster than finding all the pairs and counting them.

CodePudding user response:

You can use collections.Counter and chain.from_iterable and product from itertools:

from collections import Counter
from itertools import product, chain

output = Counter(
    f"{s}_{p}" for p, s in 
    product(*map(chain.from_iterable, [processes, staffs]))
)

with open(file) as outfile:
    for name, count in output.items():
        outfile.write(f"{name}@{count}\n")

A little more verbose version would be:

all_processes = chain.from_iterable(processes)
all_staffs = chain.from_iterable(staffs)

name_counts = Counter(f"{s}_{p}" for s, p in product(all_processes, all_staffs))

with open(file) as outfile:
    for name, count in name_counts.items():
        outfile.write(f"{name}@{count}\n")
  • Related