I'm trying to do something I feel shouldn't be too difficult but I'm having trouble with it.
The best way would be to illustrate it. I currently have a DF with many rows/columns. I want to grab the highest value of Column B, in the unique value of Column A, and drop the rest of the rows that don't matter. I am having a hard to articulating what is is, maybe grouping etc
For example:
Initial Table
Col A - Col B - Col C
2012 1 2
2012 1 7
2012 2 45
2012 2 34
2012 3 4
2012 3 32
2013 1 54
2013 1 3
2013 2 5
2013 2 23
Table I want to end up with
Col A - Col B - Col C
2012 3 4
2012 3 32
2013 2 5
2013 2 23
Thanks for any help everyone!!
Will keep searching around but it's a tricky one
CodePudding user response:
Try:
df_out = df.groupby("Col A").apply(lambda x: x[x["Col B"] == x["Col B"].max()])
print(df_out.reset_index(drop=True))
Prints:
Col A Col B Col C
0 2012 3 4
1 2012 3 32
2 2013 2 5
3 2013 2 23
Or: using .transform
:
df_out = df[df["Col B"] == df.groupby("Col A")["Col B"].transform("max")]
print(df_out)