Home > Software design >  Filter list-valued columns
Filter list-valued columns

Time:01-26

I have this kind of dataset:

id   value   cond1     cond2
 a   1      ['a','b']  [1,2]
 b   1      ['a']      [1]
 a   2      ['b']      [2]
 a   3      ['a','b']  [1,2]
 b   3      ['a','b']  [1,2]

I would like to extract all the rows using the conditions, something like

df.loc[(df['cond1']==['a','b']) & (df['cond2']==[1,2])

this syntax produces however

ValueError: ('Lengths must match to compare', (100,), (1,))    

or this if I use isin:

SystemError: <built-in method view of numpy.ndarray object at 0x7f1e4da064e0> returned a result with an error set

How to do it right?

Thanks!

CodePudding user response:

Since it tries to interpret the lists as an array-like, it attempts a column-wise comparison and fails as seen. A way is to tuplify:

df.loc[(df["cond1"].map(tuple) == ("a", "b")) & (df["cond2"].map(tuple) == (1, 2))]

  id  value   cond1   cond2
0  a      1  [a, b]  [1, 2]
3  a      3  [a, b]  [1, 2]
4  b      3  [a, b]  [1, 2]
  • Related