I have the following dataframe:
df = pd.DataFrame({'a':[1,2,2,3,3,3,4,4,4], 'b':[-0.1,-0.2,-0.1,-0.1,-0.005,-0.3,0,-0.9,-0.6],'name':['fast','slow1','slow2','slow1','fast1','fast1','slow','fast','slow1']})
Output
a b name
0 1 -0.100 fast
1 2 -0.200 slow1
2 2 -0.100 slow2
3 3 -0.100 slow1
4 3 -0.005 fast1
5 3 -0.300 fast1
6 4 0.000 slow
7 4 -0.900 fast
8 4 -0.600 slow1
I am grouping it by column "a"
df.groupby(by=["a"]).agg({"b":"min"})
b
a
1 -0.1
2 -0.2
3 -0.3
4 -0.9
How do I select the corresponding "name" column for the min index of column "b"? What I am trying to get at is this:
b name
a
1 -0.1 fast
2 -0.2 slow1
3 -0.3 fast1
4 -0.9 fast
I tried using the "apply" method but for large dataframes it was getting really slow. Is there a way to use the "agg" function here?
CodePudding user response:
One approach, using idxmin
:
res = df.groupby(by=["a"]).agg({"b": ["min", pd.NamedAgg(column="name", aggfunc=lambda x: df["name"].iloc[x.idxmin()])]})
print(res)
Output
b
min name
a
1 -0.1 fast
2 -0.2 slow1
3 -0.3 fast1
4 -0.9 fast
CodePudding user response:
Apparently, you can easily do it using groupby.min()
.
dd.groupby('a').min()
Out[250]:
b name
a
1 -0.1 fast
2 -0.2 slow1
3 -0.3 fast1
4 -0.9 fast
CodePudding user response:
You can do:
df.loc[df.groupby('a')['b'].idxmin()].set_index('a')
output:
b name
a
1 -0.1 fast
2 -0.2 slow1
3 -0.3 fast1
4 -0.9 fast