I have this kind of dataset:
id value cond1 cond2
a 1 ['a','b'] [1,2]
b 1 ['a'] [1]
a 2 ['b'] [2]
a 3 ['a','b'] [1,2]
b 3 ['a','b'] [1,2]
I would like to extract all the rows using the conditions, something like
df.loc[(df['cond1']==['a','b']) & (df['cond2']==[1,2])
this syntax produces however
ValueError: ('Lengths must match to compare', (100,), (1,))
or this if I use isin
:
SystemError: <built-in method view of numpy.ndarray object at 0x7f1e4da064e0> returned a result with an error set
How to do it right?
Thanks!
CodePudding user response:
Since it tries to interpret the lists as an array-like, it attempts a column-wise comparison and fails as seen. A way is to tuplify:
df.loc[(df["cond1"].map(tuple) == ("a", "b")) & (df["cond2"].map(tuple) == (1, 2))]
id value cond1 cond2
0 a 1 [a, b] [1, 2]
3 a 3 [a, b] [1, 2]
4 b 3 [a, b] [1, 2]