Home > database >  I have data frame and I want to calculate the last three month transaction count and sum for each Gr
I have data frame and I want to calculate the last three month transaction count and sum for each Gr

Time:12-14

Input data set

Id Date TransAmt
A 2022-01-02 10
A 2022-01-02 20
A 2022-02-04 30
A 2022-02-05 20
A 2022-04-08 300
A 2022-04-11 100
A 2022-05-13 200
A 2022-06-12 20
A 2022-06-15 300
A 2022-08-16 100

Desired output

Id Date TransAmt CountThreeMonth AmountThreeMonths
A 2022-01-02 10 2 30
A 2022-01-02 20 2 30
A 2022-02-04 30 4 80
A 2022-02-05 20 4 80
A 2022-04-08 300 4 450
A 2022-04-11 100 4 450
A 2022-05-13 200 3 600
A 2022-06-12 20 5 920
A 2022-06-15 300 5 920
A 2022-08-16 100 3 420

Note: 1. There can be multiple transaction for same date i.e. on 2022-01-02 there are two transaction. 2. I want calculate last 3 months transaction like- Present Month total Transaction count Previous two month total Transaction count. Similar logic for amount. like for Jan month only 2 transaction and previous month does not have any transaction so 2 0 0 =2. 3. I want all calculation for Each group of Id.

Please help me achieve my desired output Thanking you in Advanced.

CodePudding user response:

Example

data = [['A', '2022-01-02', 10], ['A', '2022-01-02', 20], ['A', '2022-02-04', 30], 
        ['A', '2022-02-05', 20], ['A', '2022-04-08', 300], ['A', '2022-04-11', 100],
        ['A', '2022-05-13', 200], ['A', '2022-06-12', 20], ['A', '2022-06-15', 300],
        ['A', '2022-08-16', 100], ['B', '2022-01-02', 10], ['B', '2022-01-02', 20],
        ['B', '2022-02-04', 30], ['B', '2022-02-05', 20], ['B', '2022-04-08', 300],
        ['B', '2022-04-11', 100], ['B', '2022-05-13', 200], ['B', '2022-06-12', 20],
        ['B', '2022-06-15', 300], ['B', '2022-08-16', 100]]
df1 = pd.DataFrame(data, columns=['Id', 'Date', 'TransAmt'])

df1

    Id  Date    TransAmt
0   A   2022-01-02  10
1   A   2022-01-02  20
2   A   2022-02-04  30
3   A   2022-02-05  20
4   A   2022-04-08  300
5   A   2022-04-11  100
6   A   2022-05-13  200
7   A   2022-06-12  20
8   A   2022-06-15  300
9   A   2022-08-16  100
10  B   2022-01-02  10
11  B   2022-01-02  20
12  B   2022-02-04  30
13  B   2022-02-05  20
14  B   2022-04-08  300
15  B   2022-04-11  100
16  B   2022-05-13  200
17  B   2022-06-12  20
18  B   2022-06-15  300
19  B   2022-08-16  100

Code

s = df1['Date']
df1['Date'] = df1['Date'].astype('Period[M]') 
df2 = df1.groupby(['Id', 'Date'])['TransAmt'].agg(['count', sum])
idx1 = pd.period_range(df1['Date'].min(), df1['Date'].max(), freq='M')
idx2 = pd.MultiIndex.from_product([df1['Id'].unique(), idx1])
cols = ['Id', 'Date', 'CountThreeMonth', 'AmountofThreeMonth']
n = 3
df3 = df2.reindex(idx2, fill_value=0).groupby(level=0).rolling(n, min_periods=1).sum().droplevel(0).reset_index().set_axis(cols, axis=1)
df1.merge(df3, how='left').assign(Date=s)

result(df1.merge(df3, how='left').assign(Date=s))

    Id  Date    TransAmt    CountThreeMonth AmountofThreeMonth
0   A   2022-01-02  10      2.0             30.0
1   A   2022-01-02  20      2.0             30.0
2   A   2022-02-04  30      4.0             80.0
3   A   2022-02-05  20      4.0             80.0
4   A   2022-04-08  300     4.0             450.0
5   A   2022-04-11  100     4.0             450.0
6   A   2022-05-13  200     3.0             600.0
7   A   2022-06-12  20      5.0             920.0
8   A   2022-06-15  300     5.0             920.0
9   A   2022-08-16  100     3.0             420.0
10  B   2022-01-02  10      2.0             30.0
11  B   2022-01-02  20      2.0             30.0
12  B   2022-02-04  30      4.0             80.0
13  B   2022-02-05  20      4.0             80.0
14  B   2022-04-08  300     4.0             450.0
15  B   2022-04-11  100     4.0             450.0
16  B   2022-05-13  200     3.0             600.0
17  B   2022-06-12  20      5.0             920.0
18  B   2022-06-15  300     5.0             920.0
19  B   2022-08-16  100     3.0             420.0

I'm sorry it's hard to explain

  • Related