I am trying to filter a data frame which is saved in a csv file. The data frame is as follows : Sample Dataframe
A B C
x 3 ['a',1,2,'b']
y 2 ['c',4,6,'b']
z 1 ['d',5,7,'e']
The above data frame contains of three columns but the third column C has list a value. I know for a fact that I can use df.query('A == x')
and it will print out the rows which contains values as 'x'. But how can I do the same for the column C. Suppose I want to print rows from column C which does not contain 'b'. Basically it should print
z 1 ['d',5,7,'e'].
CodePudding user response:
To filter out the rows that contain "b" as the third item:
out = df[df['C'].str[2].ne('b')]
For any item:
out = df[["b" not in l for l in df['C']]]
To remove rows matching both "a" AND "b":
out = df[[{'a', 'b'}.issubset(l) for l in df['C']]
For either "a" OR "b":
out = df[[not {'a', 'b'}.isdisjoint(l) for l in df['C']]
CodePudding user response:
You should be able to use an apply function
df[df['C'].apply(lambda x: 'b' not in x)]
Edit: if you want to filter any row that contains 'b' or 3, you can do:
df[df['C'].apply(lambda x: np.all(['b' not in x, 3 not in x]))]
Likewise, if you want to filter out rows that contain both 'b' and 3 (but either just 'b' or 3 is allowed) you can change the above to np.any instead of np.all