Home > Mobile >  Is there a way to group columns with multiple conditions using pandas?
Is there a way to group columns with multiple conditions using pandas?

Time:10-31

My dataframe is similar to:

transaction, date, cash, (1, 1/1/2020, 72,) (2, 1/3/2020, 100,) (2, 1/5/2020, -75,) (3, 1/5/2020, 82,)

I want the output to group by transaction and to sum the cash for each transaction (if there is two amounts) BUT ALSO to return the later date. So for transaction 2 the end result would show transaction, date, cash as: 2,1/5/2020, 25...

Not sure how to make tables to help the visuals in my question yet so sorry, please let me know if there's any questions.

CodePudding user response:

Use groupby agg. Check the docs examples.

output = df.groupby('transaction').agg({'date': 'max', 'cash': 'sum'})

CodePudding user response:

This solution assumes that the date column is encoded as proper datetime instances. If this is not currently the case, try df['date'] = pd.to_datetime(df['date']) before doing the following.

>>> df
   transaction       date  cash
0            1 2020-01-01    72
1            2 2020-01-03   100
2            2 2020-01-05   -75
3            3 2020-01-05    82
>>> transactions = df.groupby('transaction')
>>> pd.concat((transactions['cash'].sum(), transactions['date'].max()), axis=1)
             cash       date
transaction
1              72 2020-01-01
2              25 2020-01-05
3              82 2020-01-05

transactions['date'].max() picks the date furthest into the future of those with the same transaction ID.

  • Related