Home > Software engineering >  Filter Data Frame which has list as a Column Value
Filter Data Frame which has list as a Column Value

Time:10-18

I am trying to filter a data frame which is saved in a csv file. The data frame is as follows : Sample Dataframe

A B C
x 3 ['a',1,2,'b']
y 2 ['c',4,6,'b']
z 1 ['d',5,7,'e']

The above data frame contains of three columns but the third column C has list a value. I know for a fact that I can use df.query('A == x') and it will print out the rows which contains values as 'x'. But how can I do the same for the column C. Suppose I want to print rows from column C which does not contain 'b'. Basically it should print

z 1 ['d',5,7,'e'].

CodePudding user response:

To filter out the rows that contain "b" as the third item:

out = df[df['C'].str[2].ne('b')]

For any item:

out = df[["b" not in l for l in df['C']]]

To remove rows matching both "a" AND "b":

out = df[[{'a', 'b'}.issubset(l) for l in df['C']]

For either "a" OR "b":

out = df[[not {'a', 'b'}.isdisjoint(l) for l in df['C']]

CodePudding user response:

You should be able to use an apply function df[df['C'].apply(lambda x: 'b' not in x)]

Edit: if you want to filter any row that contains 'b' or 3, you can do: df[df['C'].apply(lambda x: np.all(['b' not in x, 3 not in x]))]

Likewise, if you want to filter out rows that contain both 'b' and 3 (but either just 'b' or 3 is allowed) you can change the above to np.any instead of np.all

  • Related