I have a dataframe:
col1 col2 col3
a b b
a b c
k l o
b l b
I want to keep only rows where col1 is "a", col2 is "b" and col3 is "b" or col1 is "k", col2 is "l" and col3 is "o". So desired result is:
col1 col2 col3
a b b
k l o
How to do that? i can write dt[(dt["col1"]=="a")&(dt["col2"]=="b")&(dt["col1"]=="b")]
but what about second case? should i put it with or?
CodePudding user response:
abb = (df["col1"]=="a") & (df["col2"]=="b") & (df["col3"]=="b")
klo = (df["col1"]=="k") & (df["col2"]=="l") & (df["col3"]=="o")
df[(abb) | (klo)]
col1 col2 col3
0 a b b
2 k l o
Alternatively, you could write something like this, just to avoid all those conditionals:
abb = 'abb'
klo = 'klo'
strings = [abb, klo]
def f(x):
if ''.join(x) in strings:
return True
return False
df[df.apply(lambda x: f(x), axis=1)]
col1 col2 col3
0 a b b
2 k l o
So, here we are applying a custom function to each row with df.apply
. Inside the function we turn the row into a single string with str.join
and we check if this string exists in our predefined list of strings. Finally, we use the resulting pd.Series with booleans
to select from our df.
CodePudding user response:
you can create a check_list
then use pandas.apply
with axis=1
and check in specific columns any list of check exist or not.
col = ['col1','col2','col3']
chk = [['a','b','b'],['k','l','o']]
msk = df.apply(lambda row : any((row[col]==c).all() for c in chk), axis=1)
df = df[msk]
print(df)
col1 col2 col3
0 a b b
2 k l o