Home > front end >  I'd like to group by two columns and then print out rows corresponding to minimum value on the
I'd like to group by two columns and then print out rows corresponding to minimum value on the

Time:04-19

Here is an example of two dataframes in pandas:

 df = pd.DataFrame({0: [1.5, 2, 2.3, 4], 1: [3, 6, 1, 0]})
 df2 = pd.DataFrame({0: [1.7, 4.05, 2.1, 2.99], 1: [1, 3, 1, 7]})

df.columns = ["x1", "y1"]
df2.columns = ["x2", "y2"]

Then I merge them and make another dataframe which consists only of x1 and x2 columns that are closest to each other.

merged1 = df.merge(df2, how='cross')
merged1['diff'] = (merged1['x1'].sub(merged1['x2'])).abs()

out1 = (merged1.loc[merged1.groupby(df.columns.tolist())['diff'].idxmin().to_numpy()])

And lastly, I'd like to group by column y2 and print out the row of each group that has the smallest value of the column diff

out1.groupby(["y2"])['diff'].min()

But it prints out all the rows:

    x1  y1    x2  y2  diff
0   1.5   3  1.70   1  0.20
6   2.0   6  2.10   1  0.10
10  2.3   1  2.10   1  0.20
13  4.0   0  4.05   3  0.05

What's the problem?

CodePudding user response:

You need to use:

out1.loc[out1.groupby(["y2"])['diff'].idxmin()]

output:

     x1  y1    x2  y2  diff
6   2.0   6  2.10   1  0.10
13  4.0   0  4.05   3  0.05

Note that you could do the whole thing with just 2 commands (using pandas.merge_asof):

(pd
 .merge_asof(df, df2.sort_values(by='x2'),
             left_on='x1', right_on='x2', direction='nearest')
 .loc[lambda d: d['x1'].sub(d['x2']).abs().groupby(d['y2']).idxmin()]
)

or, to also have the "diff" column:

(pd
 .merge_asof(df, df2.sort_values(by='x2'),
             left_on='x1', right_on='x2', direction='nearest')
 .assign(diff=lambda d: d['x1'].sub(d['x2']).abs())
 .loc[lambda d: d.groupby('y2')['diff'].idxmin()]
)
  • Related