Home > Net >  Keep the corresponding value of another column after using groupby in a column
Keep the corresponding value of another column after using groupby in a column

Time:12-06

After using groupby and agg to find the max and the min, I would like to keep the value of another column corresponding to said Max and Min.

What I have:

Order Missing Date
Order 1 Missing 1 2002
Order 1 Missing 2 2003
Order 2 Missing 3 2004
Order 2 Missing 4 2005
Order 2 Missing 5 2006
Order 3 Missing 6 2007

What I do

calculation = df.groupby(by=('Order')).agg(Max=('Date','max'),Min=('Date','min')).reset_index()
calculation['difference'] = calculation['Max']-calculation['Min']

What I get

Order Min Max difference
Order 1 2002 2003 1 year
Order 2 2004 2006 2 years
Order 3 2007 2007 0s

What I want,

Order Min Max difference Missing Min Missing Max
Order 1 2002 2003 1 year Missing 1 Missing 2
Order 2 2004 2006 2 years Missing 3 Missing 5
Order 3 2007 2007 0s Missing 6 Missing 6

CodePudding user response:

I would use separate groupby and a merge:

g = df.groupby(by='Order')

(pd.merge(df.loc[g['Date'].idxmin()]
            .rename({'Date': 'Min', 'Missing': 'Missing Min'}, axis=1),
          df.loc[g['Date'].idxmax()]
            .rename({'Date': 'Max', 'Missing': 'Missing Max'}, axis=1),
          on='Order')
   .assign(difference=lambda d: d['Max']-d['Min'])
)

Output:

     Order Missing Min   Min Missing Max   Max  difference
0  Order 1   Missing 1  2002   Missing 2  2003           1
1  Order 2   Missing 3  2004   Missing 5  2006           2
2  Order 3   Missing 6  2007   Missing 6  2007           0

CodePudding user response:

maybe try to add additional group by calculations/columns, similiar to how you created "Max" and "Min" columns from "Date". (I apologize, I am not able to debug code right now).:

calculation = df.groupby(by=('Order')).agg(Max=('Date','max'),Min=('Date','min'), Missing_Min=('Missing','min'), Missing_Max=('Missing','max')).reset_index()
calculation['difference'] = calculation['Max']-calculation['Min']

CodePudding user response:

You need a merge. Two more lines in your code would get you what you want.

calculation["Missing Min"] = calculation.merge(
    df, left_on=["Order", "Min"], right_on=["Order", "Date"]
)["Missing"]
calculation["Missing Max"] = calculation.merge(
    df, left_on=["Order", "Max"], right_on=["Order", "Date"]
)["Missing"]

print(calculation):

             Order   Max   Min  difference Missing Min Missing Max
0          Order 1   2003  2002           1  Missing 1   Missing 2 
1          Order 2   2006  2004           2  Missing 3   Missing 5 
2          Order 3   2007  2007           0  Missing 6   Missing 6 
  • Related