My dataframe is similar to:
transaction, date, cash, (1, 1/1/2020, 72,) (2, 1/3/2020, 100,) (2, 1/5/2020, -75,) (3, 1/5/2020, 82,)
I want the output to group by transaction and to sum the cash for each transaction (if there is two amounts) BUT ALSO to return the later date. So for transaction 2 the end result would show transaction, date, cash as: 2,1/5/2020, 25...
Not sure how to make tables to help the visuals in my question yet so sorry, please let me know if there's any questions.
CodePudding user response:
Use groupby
agg
. Check the docs examples.
output = df.groupby('transaction').agg({'date': 'max', 'cash': 'sum'})
CodePudding user response:
This solution assumes that the date
column is encoded as proper datetime instances. If this is not currently the case, try df['date'] = pd.to_datetime(df['date'])
before doing the following.
>>> df
transaction date cash
0 1 2020-01-01 72
1 2 2020-01-03 100
2 2 2020-01-05 -75
3 3 2020-01-05 82
>>> transactions = df.groupby('transaction')
>>> pd.concat((transactions['cash'].sum(), transactions['date'].max()), axis=1)
cash date
transaction
1 72 2020-01-01
2 25 2020-01-05
3 82 2020-01-05
transactions['date'].max()
picks the date furthest into the future of those with the same transaction ID.