I have a df that looks like this
user_id customer_id transaction_amount transaction_type transaction_date
0 0 10000 debit 2020-09-08
1 1 20000 debit 2020-09-08
Now I want to calculate the average amount of transactions of each user in the weeks. So far I have achieved
df.groupby(['user_id', 'transaction_date']).mean()
Which gives me the output in the following format
transaction_amount
user_id transaction_date
1 2021-03-08 36114.913043
2021-03-09 13425.000000
2021-03-23 14915.200000
2021-03-24 9762.000000
2 2021-03-08 1234114.913043
2021-02-08 1354325.000000
2021-12-23 141241915.200000
2021-5-21 9735162.000000
How can I get it in the format of
user_id | customer_id | average_transactions_per_week
_______________________________________________________
1 1 5 # 5 transactions in 1 week
2 2 8 # 8 transactions in 1 week
CodePudding user response:
IIUC, you shouldn't use sum
but count
. Also, to group by week, you need to compute a grouper, here we can use a period:
df.groupby(['user_id', 'customer_id', pd.to_datetime(df['transaction_date']).dt.to_period('W')]).agg(transactions_per_week=('user_id', 'count'))
Output:
transactions_per_week
user_id customer_id transaction_date
0 0 2020-09-07/2020-09-13 1
1 1 2020-09-07/2020-09-13 1
Now to get the average number of transactions, you need a second groupby
:
(df
.groupby(['user_id', 'customer_id', pd.to_datetime(df['transaction_date']).dt.to_period('W')])
.agg(average_transactions_per_week=('user_id', 'count'))
.groupby(['user_id', 'customer_id']).mean()
)
Output:
average_transactions_per_week
user_id customer_id
0 0 1.0
1 1 1.0
CodePudding user response:
df['week'] = pd.to_datetime(df['transaction_date']).dt.week
df.groupby(['user_id', 'customer_id', 'transaction_date', 'week']).mean().reset_index()
CodePudding user response:
import pandas as pd
d = {'user_id': [0, 1], 'customer_id': [0, 1], 'transaction_amount': [10000, 20000], 'transaction_type': ['debit', 'debit'], 'transaction_date': ['2020-09-08', '2020-09-08']}
df = pd.DataFrame(data=d)
df['transaction_date'] = pd.to_datetime(df['transaction_date'])
avgdf = df.groupby(['transaction_date','user_id'], as_index= False).size().rename(columns={'size': 'avg_trans_amount'})
df = df.merge(avgdf, on = ['transaction_date','user_id'])
print(df)