Iterating Through List of Lists, Maintaining List Structure-CodePudding

Say I have the following list of list of names:

names = [['Matt', 'Matt', 'Paul'], ['Matt']]

I want to return only the "Matts" in the list, but I also want to maintain the list of list structure. So I want to return:

[['Matt', 'Matt'], ['Matt']]

I've something like this, but this will append everthting together in one big list:

matts = [name for namelist in names for name in namelist if name=="Matt"]

I know something like this is possible, but I want to avoid iterating through lists and appending. Is this possible?

names = [['Matt', 'Matt', 'Paul'], ['Matt']]
matts = []
for namelist in names:
    matts_namelist = []
    for name in namelist:
        if name=="Matt":
            matts_namelist.append(name)
        else:
            pass
    matts.append(matts_namelist)

CodePudding user response：

Use a nested list comprehension, as below:

names = [['Matt', 'Matt', 'Paul'], ['Matt']]
res = [[name for name in lst if name == "Matt"] for lst in names]
print(res)

Output

[['Matt', 'Matt'], ['Matt']]

The above nested list comprehension is equivalent to the following for-loop:

res = []
for lst in names:
    res.append([name for name in lst if name == "Matt"])
print(res)

A third alternative functional alternative using filter and partial, is to do:

from operator import eq
from functools import partial

names = [['Matt', 'Matt', 'Paul'], ['Matt']]

eq_matt = partial(eq, "Matt")
res = [[*filter(eq_matt, lst)] for lst in names]
print(res)

Micro-Benchmark

%timeit [[*filter(eq_matt, lst)] for lst in names]
56.3 µs ± 519 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [[name for name in lst if "Matt" == name] for lst in names]
26.9 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Setup (of micro-benchmarks)

import random
population = ["Matt", "James", "William", "Charles", "Paul", "John"]
names = [random.choices(population, k=10) for _ in range(50)]

Full Benchmark

Candidates

def nested_list_comprehension(names, needle="Matt"):
    return [[name for name in lst if needle == name] for lst in names]


def functional_approach(names, needle="Matt"):
    eq_matt = partial(eq, needle)
    return [[*filter(eq_matt, lst)] for lst in names]


def count_approach(names, needle="Matt"):
    return [[needle] * name.count(needle) for name in names]

Plot Plot of alternative solutions

The above results were obtained for a list that varies from 100 to 1000 elements where each element is a list of 10 strings chosen at random from a population of 14 strings (names). The code for reproducing the results can be found here. As it can be seen from the plot the most performant solution is the one from @rv.kvetch.

CodePudding user response：

An alternate way using list.count:

>>> names = [['Matt', 'Matt', 'Paul'], [], ['Matt']]
>>> [name.count('Matt') * ['Matt'] for name in names]
[['Matt', 'Matt'], [], ['Matt']]

You could also try with itertools.repeat:

>>> import itertools
>>> [[*itertools.repeat('Matt', name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]

Lastly, as suggested by @DaniMensejo, you could also use the range iterator within a nested list comprehension:

>>> [['Matt' for _ in range(name.count('Matt'))] for name in names]
[['Matt', 'Matt'], [], ['Matt']]

CodePudding user response：

IIUC, you can do this with a nested list like below:

>>> names = [['Matt', 'Matt', 'Paul'], ['Matt']]
>>> [[name for name in lst_name if name=='Matt'] for lst_name in names]
[['Matt', 'Matt'], ['Matt']]

CodePudding user response：

Use the filter function -

matts = [list(filter(lambda x: x=='Matt', namelist)) for namelist in names]