Home > Back-end >  How do you exclude sub-lists from a list based on whether or not a specific element of each sub-list
How do you exclude sub-lists from a list based on whether or not a specific element of each sub-list

Time:10-14

(Python novice here:)

I have been trying to filter a list of sub-lists (all of the same length) based on the presence of certain strings within the elements of the sub-lists. To create criteria for inclusion, I have done the following, which has worked fine:

lines = [['Bob','Risk Manager','Company1'],
         ['Bill','Senior Quality Control Manager','Company1'],
         ['Jill','Accreditation Specialist','Company2'],
         ['Jane','Administrator','Company3'],
         ['Joe','IT Specialist','Company4']]

filtered_lines = []

inclusion_criteria = [['Risk',1],['Quality',1],['Accred',1]]

for line in lines:
    for criterion in inclusion_criteria:
        if criterion[0] in line[criterion[1]]:
            filtered_lines.append(line)

The above code filled the filtered_lines list with sub-lists whose second element contained 'Risk', 'Quality' or 'Accred', i.e. 'Jane' and 'Joe' were filtered out - this worked as planned.

However, if I instead want to define criteria for exclusion from the filtered_lines list, then the following does not work:

exclusion_criteria = [['Company1',2],['Company2',2]]

for line in lines:
    for criterion in exclusion_criteria:
        if criterion[0] not in line[criterion[1]]:
            filtered_lines.append(line)

When I run the above code, I want every sub-list whose third element does not contain 'Company1' or 'Company2' to be added to filtered_lines, i.e. filtered_lines should contain only 'Jane' and 'Joe', but this does not happen. Instead, no filtration occurs, and filtered_list comes out the same as the original lines list.

How would you go about excluding items from a list based on a set of exclusion criteria? Furthermore, is there a better way of approaching inclusion criteria?

P.S.: The lines list and criteria given here are just examples; the real lines is around 25,000 sub-lists long, and there are over a dozen inclusion and - if I can get it working - exclusion criteria. I'm not sure if/how the size of these objects effects any possible solutions.

CodePudding user response:

You're looping over all elements in exclusion_criteria and if any one doesn't match then you add item to filtered_list. At the end that means your filtered_list has all items.

Try to use all() and/or any() to get your match:

lines = [
    ["Bob", "Risk Manager", "Company1"],
    ["Bill", "Senior Quality Control Manager", "Company1"],
    ["Jill", "Accreditation Specialist", "Company2"],
    ["Jane", "Administrator", "Company3"],
    ["Joe", "IT Specialist", "Company4"],
]

exclusion_criteria = [["Company1", 2], ["Company2", 2]]
filtered_lines = []

for line in lines:
    if all(
        criterion[0] not in line[criterion[1]]
        for criterion in exclusion_criteria
    ):
        filtered_lines.append(line)

print(filtered_lines)

Prints:

[
  ["Jane", "Administrator", "Company3"], 
  ["Joe", "IT Specialist", "Company4"]
]
  • Related