Filter pandas dataframe by multiple columns, using tuple from list of tuples-CodePudding

So I have been referencing this previous question posted here Filter pandas dataframe from tuples.

But the problem I am trying to solve is slightly different.

I have a list of tuples. Each tuple represents a different set of filters I would like to apply to a dataframe accross multiple columns, so I can isolate the records to perform additional tasks.

Whenever I try and filter by a single tuple within the list of tuples, the dataframe I get back has no records.. If I break the tuple values out in a very long form way it works fine. Not sure what I am missing or not thinking about here..

Using the same example from the post I have been referencing....

AB_col = [(0,230), (10,215), (15, 200), (20, 185), (40, 177), 
                (0,237), (10,222), (15, 207), (20, 192), (40, 184)]

sales = [{'account': 'Jones LLC', 'A': 0, 'B': 230, 'C': 140},
         {'account': 'Alpha Co',  'A': 20, 'B': 192, 'C': 215},
         {'account': 'Blue Inc',  'A': 50,  'B': 90,  'C': 95 }]
df = pd.DataFrame(sales)

example dataframe

The answer from the other question

df = df[df[["A","B"]].apply(tuple, 1).isin(AB_col)]

Which returns

example results

However, I want to only get one record back, that matches the first tuple in the list of tuples. So I tried this

df[df[["A"]].apply(tuple,1).isin(AB_col[0])]

But get no records returned

my modification results

However, I can do this which gets me the results I want, but when I have essentially a list of tuples that is every combination of column values to use a filters for different levels of calculations, this seems like way too much code to have to use to product the desired results

df[(df['A']==AB_col[0][0]) & (df['B']==AB_col[0][1])]

Which gets me results I want

long form results but what i need

Is there a way to get to this same result more efficiently?

Clarification Update: I don't just need to select all rows by only ever using the first tuple in the list of tuples. I will need ultimately iterate through my list of tuples, and use each tuple to filter the target dataframe to perform additional actions. Also I'm not stuck on keeping a list of tuples as the filters I need to iterate through, I can change it to another form if easier/more performant.

Example:

AB_col = [(0,230),(20,192)]

Iteration 1: filter DF by (0,230) -> do stuff on returned results
Iteration 2: filter DF by (20,192) -> do stuff on returned results
Iteration 3: filter DF by ... and so on until I have iterated through my list of tuple filters.

Thanks!

CodePudding user response：

TL;DR: use df[df[["A","B"]].apply(tuple, 1) == AB_col[0]].

I think you might be overthinking the matter. Let's dissect the code a bit:

df[["A","B"]].apply(tuple, 1) # or: df[["A","B"]].apply(tuple, axis=1)
# meaning: create tuples for each row

0     (0, 230)
1    (20, 192)
2     (50, 90)
dtype: object

So this just gets us A and B as tuples. Next, applying df.isin is a way to look whether any of these tuples exist inside your list AB_col. It's a consecutive evaluation:

(0, 230) in AB_col # True
(20, 192) in AB_col # True
(50, 90) in AB_col # False

# no different than: 
1 in [1,2,3] # True

The resulting Series with booleans is then used to select from the df. Hence, the result (first two rows, not the third):

     account   A    B    C
0  Jones LLC   0  230  140
1   Alpha Co  20  192  215

All you want to do, is return the rows that match the first element from the list AB_col:

I want to only get one record back, that matches the first tuple in the list of tuples.

So, that's easy enough. We simply need ==:

first_elem = AB_col[0]

df[df[["A","B"]].apply(tuple, 1) == first_elem]

     account  A    B    C
0  Jones LLC  0  230  140

CodePudding user response：

Use idxmax to get the index of the first True.

df.loc[[df[['A', 'B']].apply(tuple, 1).isin(AB_col).idxmax()]]

Output:

     account  A    B    C
0  Jones LLC  0  230  140