Home > other >  Filter elements in a list not containing multiple substrings in Python
Filter elements in a list not containing multiple substrings in Python

Time:04-28

For a list of file names file_names, I try to use code below to filter file names not containing foo or bar:

file_names = ['foo_data.xlsx', 'bar_data.xlsx', 'data.xlsx']
subs = ['foo', 'bar']

for file_name in file_names:
    for sub in subs:
        if sub not in file_name:
            print(file_name)

Output:

foo_data.xlsx
bar_data.xlsx
data.xlsx
data.xlsx

But it's not working out, it should return data.xlsx.

Meanwhile, it works for containing case:

file_names = ['foo_data.xlsx', 'bar_data.xlsx', 'data.xlsx']
subs = ['foo', 'bar']

for file_name in file_names:
    for sub in subs:
        if sub in file_name:
            print(file_name)

Out:

foo_data.xlsx
bar_data.xlsx

Does someone could help to explain what's error in my code and how to fix it? Thanks.

Reference:

Does Python have a string 'contains' substring method?

CodePudding user response:

Since you don't want any sub to be in the file names; one way is to wrap the inner loop with all:

for file_name in file_names:
    if all(sub not in file_name for sub in subs):
        print(file_name)

Output:

data.xlsx

CodePudding user response:

One regex approach would be to form an alternation of the blacklist substrings, then use re.search and a list comprehension to find the matches.

file_names = ['foo_data.xlsx', 'bar_data.xlsx', 'data.xlsx']
subs = ['foo', 'bar']
regex = r'(?:'   '|'.join(subs)   r')'
matches = [f for f in file_names if not re.search(regex, f)]
print(matches)  # ['data.xlsx']
  • Related