i have a table named 'balance' like this
user_id date transaction_amount
123 2021-08-30 1000
123 2021-08-31 500
456 2021-09-30 -750
789 2021-09-30 50
i need to know total buy/sell/net transaction on monthly level (end-of-period/eop) which :
buy = transaction > 0
sell = transaction < 0
net = total buy = total sell
how to write the code?
so far i've tried
eop_trans = df_balance.groupby([df_balance['user_id'], df_balance['date'].dt.to_period('M')])['transaction_amount'].sum()
eop_trans
that code only return total amount of transaction per user_id per month
anyone can help? thank you very much
CodePudding user response:
Assuming that the dataframe is called df
, first, make sure that the colum date
is of datetime
df['date'] = pd.to_datetime(df['date'])
Then, if I understood correctly, the following will do the work
df_new = df.groupby(['user_id', pd.Grouper(key='date', freq='M')]).sum().reset_index()
df_new['buy'] = np.where(df_new['transaction_amount'] > 0, df_new['transaction_amount'], 0)
df_new['sell'] = np.where(df_new['transaction_amount'] < 0, df_new['transaction_amount'], 0)
df_new['net'] = df_new['buy'] df_new['sell']
df_new = df_new[['user_id', 'date', 'buy', 'sell', 'net']]
[Out]:
user_id date buy sell net
0 123 2021-08-31 1500 0 1500
1 456 2021-09-30 0 -750 -750
2 789 2021-09-30 50 0 50
Let us breakdown the steps to make it clearer.
Start by grouping by user_id
with pandas.DataFrame.groupby
df_new = df.groupby(['user_id', pd.Grouper(key='date', freq='M')]).sum().reset_index()
[Out]:
user_id date transaction_amount
0 123 2021-08-31 1500
1 456 2021-09-30 -750
2 789 2021-09-30 50
Then, let us first create the column buy
. For that one will be using numpy.where
df_new['buy'] = np.where(df_new['transaction_amount'] > 0, df_new['transaction_amount'], 0)
[Out]:
user_id date transaction_amount buy
0 123 2021-08-31 1500 1500
1 456 2021-09-30 -750 0
2 789 2021-09-30 50 50
For the column sell
one will be using numpy.where as well
df_new['sell'] = np.where(df_new['transaction_amount'] < 0, df_new['transaction_amount'], 0)
[Out]:
user_id date transaction_amount buy sell
0 123 2021-08-31 1500 1500 0
1 456 2021-09-30 -750 0 -750
2 789 2021-09-30 50 50 0
For the net
column, one will do the following
df_new['net'] = df_new['buy'] df_new['sell']
[Out]:
user_id date transaction_amount buy sell net
0 123 2021-08-31 1500 1500 0 1500
1 456 2021-09-30 -750 0 -750 -750
2 789 2021-09-30 50 50 0 50
Finally, as one doesn't want transaction_amount
to show up, one can select the desired columns as follows
df_new = df_new[['user_id', 'date', 'buy', 'sell', 'net']]
[Out]:
user_id date buy sell net
0 123 2021-08-31 1500 0 1500
1 456 2021-09-30 0 -750 -750
2 789 2021-09-30 50 0 50
CodePudding user response:
Use numpy.where
for distinguish buy/sell
, aggregate sum
and reshape by Series.unstack
, last if necessary sum both columns use DataFrame.assign
:
eop_trans = (df_balance.groupby([df_balance['user_id'],
df_balance['date'].dt.to_period('M'),
np.where(df_balance['transaction_amount'].gt(0), 'buy', 'sell')])
['transaction_amount']
.sum()
.unstack(fill_value=0)
.assign(net = lambda x: x.sum(axis=1)))
print (eop_trans)
buy sell net
user_id date
123 2021-08 1500 0 1500
456 2021-09 0 -750 -750
789 2021-09 50 0 50