Home > Software engineering >  Filtering a dataframe that has lists as a row element
Filtering a dataframe that has lists as a row element

Time:09-29

Trying to filter a dataframe using iloc and isin while looking for a results similar to any.

Data:

column tags
0 A
1 [A]
2 []
3
4 [A,B]
5 C
6 [C]
7 B
df = pd.DataFrame({"tags": ["A",["A"],[],"",["A","B"],"C",["C"],"B"]})
filter = ["A","C"]

Filtering:

df.loc[df["tags"].isin(filter)]

Result:

column tags
0 A
5 C

Desired Result:

column tags
0 A
1 [A]
4 [A,B]
5 C
6 [C]
  • I don't want to flatten the dataframe because it'll be costly for large dataframes.

CodePudding user response:

Use set.intersection in list comprehension and if-else because mixed lists and scalars for test and filter in boolean indexing:

df = pd.DataFrame({"tags": ["A",["A"],[],"",["A","B"],"C",["C"],"B"]})

f = ["A","C"]
s = set(f)

df = df[[bool(s.intersection(x if isinstance(x, list) else [x])) for x in df["tags"]]]

print (df)
     tags
0       A
1     [A]
4  [A, B]
5       C
6     [C]
  • Related