Pandas DF - Filter DF by Value in Column A, Highest Value of in Column B (Group By?)-CodePudding

I'm trying to do something I feel shouldn't be too difficult but I'm having trouble with it.

The best way would be to illustrate it. I currently have a DF with many rows/columns. I want to grab the highest value of Column B, in the unique value of Column A, and drop the rest of the rows that don't matter. I am having a hard to articulating what is is, maybe grouping etc

For example:

Initial Table

Col A  - Col B - Col C
2012        1       2
2012        1       7
2012        2       45
2012        2       34
2012        3       4
2012        3       32
2013        1       54
2013        1       3
2013        2       5
2013        2       23

Table I want to end up with

Col A  - Col B - Col C
2012        3       4
2012        3       32
2013        2       5
2013        2       23

Thanks for any help everyone!!

Will keep searching around but it's a tricky one

CodePudding user response：

Try:

df_out = df.groupby("Col A").apply(lambda x: x[x["Col B"] == x["Col B"].max()])
print(df_out.reset_index(drop=True))

Prints:

   Col A  Col B  Col C
0   2012      3      4
1   2012      3     32
2   2013      2      5
3   2013      2     23

Or: using .transform:

df_out = df[df["Col B"] == df.groupby("Col A")["Col B"].transform("max")]
print(df_out)