Home > other >  Filter pandas df with .loc[]
Filter pandas df with .loc[]

Time:10-05

I have one data frame (df) and a list (posList).
I managed to "filter" my df with .loc[] thanks to this piece of code :

    df = df.loc[
                  (df['Pos'] == posList[0]) |
                  (df['Pos'] == posList[1])
               ]

But then I tried to write this instead (just in case I have to use a larger list in the future) :

    df = df.loc[(df['Pos'] in posList)]

But this is not working, and I got the following error :

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I learn Python by myself and I am relatively new to it so I apologize in advance if this is a stupid question ...

Thanks !

CodePudding user response:

df['pos'] returns a column which is a pd.Series. If you a pd.Series in list, it will give you the ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). i.e

In [1]: import pandas as pd                                                     

In [2]: ser = pd.Series([1,2,3,4])                                              

In [3]: ser                                                                     
Out[3]: 
0    1
1    2
2    3
3    4
dtype: int64

In [4]: ser in [1,2,3,4]                                                        
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-090fa5ec1b46> in <module>
----> 1 ser in [1,2,3,4]

~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
   1477     def __nonzero__(self):
   1478         raise ValueError(
-> 1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1481         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So to avoid this, and for your example, df.loc takes in a truth array i.e if you do `df['col'] == 'anyval' it gives you a series of True or False. In the above example it'd be

ser == 2
Out[6]: 
0    False
1     True
2    False
3    False
dtype: bool

So you need to have this kind of format for your problem. So essentially, what you can do is what @RJ Adriaansen suggested.

df['Pos'].str.isin(posList)
  • Related