Let's assume there are two lists like:
list1 = ["num", "categ"]
all_names = ["col_num1", "col_num2", "col_num3", "col_categ1", "col_categ2", "col_bol1", "col_bol2", "num_extra_1", "num_extra_2", "categ_extra_1", "categ_extra_2"]
I am trying to create a new list by filtering the elements that 1) not contain "extra" and 2) contains the elements of list1
.
For example, Here is I expect to get something like this:
l=["col_num1", "col_num2", "col_num3", "col_categ1", "col_categ2"]
In Pyspark this can be done using filter, map and reduce, but not sure what is the equivalent in Python? For now, I am doing this in two steps like below, but I think there might be a more straightforward way of doing this.
temp_list = [a for a in all_names if "extra" not in a]
print(temp_list)
['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2', 'col_bol1', 'col_bol2']
l = [b for b in temp_list for c in list1 if c in b]
print(l)
['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2']
CodePudding user response:
You got the first part right.
Next, you want to include only those elements of temp_list
that contain any of the elements in list1
.
result = [b for b in temp_list if any(c in b for c in list1)]
which gives:
['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2']
Now that you understand the two steps involved here, you can combine both these steps in one instead of creating an intermediate list. Since you want both conditions to be true, use a boolean and
:
result = [a for a in all_names
if "extra" not in a
and any(c in a for c in list1)]
Note: The nested loop you have in for b in temp_list for c in list1
isn't quite right here, because you only want to select an item once, even if it contains both the elements in list1
. Consider, for example:
list1 = ["num", "categ"]
all_names = ["col_num1", "col_categ1", "col_num2_categ2", "categ_extra_1"]
# your code here
temp_list = [a for a in all_names if "extra" not in a]
l = [b for b in temp_list for c in list1 if c in b]
would give a l
that contains "col_num2_categ2"
two times, because the condition if c in b
is true for two values of c
in list1
when b = 'col_num2_categ2'
:
['col_num1', 'col_categ1', 'col_num2_categ2', 'col_num2_categ2']
CodePudding user response:
you can use something like this:
l = list(filter(lambda x: "extra" not in x and any(c in x for c in list1), all_names))
print(l)
output would be:
['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2']
CodePudding user response:
Nested list comprehension as an alternative:
list1 = ["num", "categ"]
all_names = ["col_num1", "col_num2", "col_num3", "col_categ1", "col_categ2", "col_bol1", "col_bol2", "num_extra_1", "num_extra_2", "categ_extra_1", "categ_extra_2"]
result = [a for a in all_names for c in list1 if 'extra' not in a and c in a]
print(result)
# ['col_num1', 'col_num2', 'col_num3', 'col_categ1', 'col_categ2']