After using groupby and agg to find the max and the min, I would like to keep the value of another column corresponding to said Max and Min.
What I have:
Order | Missing | Date |
---|---|---|
Order 1 | Missing 1 | 2002 |
Order 1 | Missing 2 | 2003 |
Order 2 | Missing 3 | 2004 |
Order 2 | Missing 4 | 2005 |
Order 2 | Missing 5 | 2006 |
Order 3 | Missing 6 | 2007 |
What I do
calculation = df.groupby(by=('Order')).agg(Max=('Date','max'),Min=('Date','min')).reset_index()
calculation['difference'] = calculation['Max']-calculation['Min']
What I get
Order | Min | Max | difference |
---|---|---|---|
Order 1 | 2002 | 2003 | 1 year |
Order 2 | 2004 | 2006 | 2 years |
Order 3 | 2007 | 2007 | 0s |
What I want,
Order | Min | Max | difference | Missing Min | Missing Max |
---|---|---|---|---|---|
Order 1 | 2002 | 2003 | 1 year | Missing 1 | Missing 2 |
Order 2 | 2004 | 2006 | 2 years | Missing 3 | Missing 5 |
Order 3 | 2007 | 2007 | 0s | Missing 6 | Missing 6 |
CodePudding user response:
I would use separate groupby
and a merge
:
g = df.groupby(by='Order')
(pd.merge(df.loc[g['Date'].idxmin()]
.rename({'Date': 'Min', 'Missing': 'Missing Min'}, axis=1),
df.loc[g['Date'].idxmax()]
.rename({'Date': 'Max', 'Missing': 'Missing Max'}, axis=1),
on='Order')
.assign(difference=lambda d: d['Max']-d['Min'])
)
Output:
Order Missing Min Min Missing Max Max difference
0 Order 1 Missing 1 2002 Missing 2 2003 1
1 Order 2 Missing 3 2004 Missing 5 2006 2
2 Order 3 Missing 6 2007 Missing 6 2007 0
CodePudding user response:
maybe try to add additional group by calculations/columns, similiar to how you created "Max" and "Min" columns from "Date". (I apologize, I am not able to debug code right now).:
calculation = df.groupby(by=('Order')).agg(Max=('Date','max'),Min=('Date','min'), Missing_Min=('Missing','min'), Missing_Max=('Missing','max')).reset_index()
calculation['difference'] = calculation['Max']-calculation['Min']
CodePudding user response:
You need a merge
. Two more lines in your code would get you what you want.
calculation["Missing Min"] = calculation.merge(
df, left_on=["Order", "Min"], right_on=["Order", "Date"]
)["Missing"]
calculation["Missing Max"] = calculation.merge(
df, left_on=["Order", "Max"], right_on=["Order", "Date"]
)["Missing"]
print(calculation):
Order Max Min difference Missing Min Missing Max
0 Order 1 2003 2002 1 Missing 1 Missing 2
1 Order 2 2006 2004 2 Missing 3 Missing 5
2 Order 3 2007 2007 0 Missing 6 Missing 6