So I have been referencing this previous question posted here Filter pandas dataframe from tuples.
But the problem I am trying to solve is slightly different.
I have a list of tuples. Each tuple represents a different set of filters I would like to apply to a dataframe accross multiple columns, so I can isolate the records to perform additional tasks.
Whenever I try and filter by a single tuple within the list of tuples, the dataframe I get back has no records.. If I break the tuple values out in a very long form way it works fine. Not sure what I am missing or not thinking about here..
Using the same example from the post I have been referencing....
AB_col = [(0,230), (10,215), (15, 200), (20, 185), (40, 177),
(0,237), (10,222), (15, 207), (20, 192), (40, 184)]
sales = [{'account': 'Jones LLC', 'A': 0, 'B': 230, 'C': 140},
{'account': 'Alpha Co', 'A': 20, 'B': 192, 'C': 215},
{'account': 'Blue Inc', 'A': 50, 'B': 90, 'C': 95 }]
df = pd.DataFrame(sales)
The answer from the other question
df = df[df[["A","B"]].apply(tuple, 1).isin(AB_col)]
Which returns
However, I want to only get one record back, that matches the first tuple in the list of tuples. So I tried this
df[df[["A"]].apply(tuple,1).isin(AB_col[0])]
But get no records returned
However, I can do this which gets me the results I want, but when I have essentially a list of tuples that is every combination of column values to use a filters for different levels of calculations, this seems like way too much code to have to use to product the desired results
df[(df['A']==AB_col[0][0]) & (df['B']==AB_col[0][1])]
Which gets me results I want
long form results but what i need
Is there a way to get to this same result more efficiently?
Clarification Update: I don't just need to select all rows by only ever using the first tuple in the list of tuples. I will need ultimately iterate through my list of tuples, and use each tuple to filter the target dataframe to perform additional actions. Also I'm not stuck on keeping a list of tuples as the filters I need to iterate through, I can change it to another form if easier/more performant.
Example:
AB_col = [(0,230),(20,192)]
Iteration 1: filter DF by (0,230) -> do stuff on returned results
Iteration 2: filter DF by (20,192) -> do stuff on returned results
Iteration 3: filter DF by ... and so on until I have iterated through my list of tuple filters.
Thanks!
CodePudding user response:
TL;DR: use df[df[["A","B"]].apply(tuple, 1) == AB_col[0]]
.
I think you might be overthinking the matter. Let's dissect the code a bit:
df[["A","B"]].apply(tuple, 1) # or: df[["A","B"]].apply(tuple, axis=1)
# meaning: create tuples for each row
0 (0, 230)
1 (20, 192)
2 (50, 90)
dtype: object
So this just gets us A
and B
as tuples. Next, applying df.isin
is a way to look whether any of these tuples exist inside your list AB_col
. It's a consecutive evaluation:
(0, 230) in AB_col # True
(20, 192) in AB_col # True
(50, 90) in AB_col # False
# no different than:
1 in [1,2,3] # True
The resulting Series
with booleans
is then used to select from the df
. Hence, the result (first two rows, not the third):
account A B C
0 Jones LLC 0 230 140
1 Alpha Co 20 192 215
All you want to do, is return the rows that match the first element from the list AB_col
:
I want to only get one record back, that matches the first tuple in the list of tuples.
So, that's easy enough. We simply need ==
:
first_elem = AB_col[0]
df[df[["A","B"]].apply(tuple, 1) == first_elem]
account A B C
0 Jones LLC 0 230 140
CodePudding user response:
Use idxmax
to get the index of the first True.
df.loc[[df[['A', 'B']].apply(tuple, 1).isin(AB_col).idxmax()]]
Output:
account A B C
0 Jones LLC 0 230 140