Trying to filter a dataframe
using iloc
and isin
while looking for a results similar to any
.
Data:
column | tags |
---|---|
0 | A |
1 | [A] |
2 | [] |
3 | |
4 | [A,B] |
5 | C |
6 | [C] |
7 | B |
df = pd.DataFrame({"tags": ["A",["A"],[],"",["A","B"],"C",["C"],"B"]})
filter = ["A","C"]
Filtering:
df.loc[df["tags"].isin(filter)]
Result:
column | tags |
---|---|
0 | A |
5 | C |
Desired Result:
column | tags |
---|---|
0 | A |
1 | [A] |
4 | [A,B] |
5 | C |
6 | [C] |
- I don't want to flatten the dataframe because it'll be costly for large dataframes.
CodePudding user response:
Use set.intersection
in list comprehension and if-else
because mixed lists and scalars for test and filter in boolean indexing
:
df = pd.DataFrame({"tags": ["A",["A"],[],"",["A","B"],"C",["C"],"B"]})
f = ["A","C"]
s = set(f)
df = df[[bool(s.intersection(x if isinstance(x, list) else [x])) for x in df["tags"]]]
print (df)
tags
0 A
1 [A]
4 [A, B]
5 C
6 [C]