Input data set
Id | Date | TransAmt |
---|---|---|
A | 2022-01-02 | 10 |
A | 2022-01-02 | 20 |
A | 2022-02-04 | 30 |
A | 2022-02-05 | 20 |
A | 2022-04-08 | 300 |
A | 2022-04-11 | 100 |
A | 2022-05-13 | 200 |
A | 2022-06-12 | 20 |
A | 2022-06-15 | 300 |
A | 2022-08-16 | 100 |
Desired output
Id | Date | TransAmt | CountThreeMonth | AmountThreeMonths |
---|---|---|---|---|
A | 2022-01-02 | 10 | 2 | 30 |
A | 2022-01-02 | 20 | 2 | 30 |
A | 2022-02-04 | 30 | 4 | 80 |
A | 2022-02-05 | 20 | 4 | 80 |
A | 2022-04-08 | 300 | 4 | 450 |
A | 2022-04-11 | 100 | 4 | 450 |
A | 2022-05-13 | 200 | 3 | 600 |
A | 2022-06-12 | 20 | 5 | 920 |
A | 2022-06-15 | 300 | 5 | 920 |
A | 2022-08-16 | 100 | 3 | 420 |
Note: 1. There can be multiple transaction for same date i.e. on 2022-01-02 there are two transaction. 2. I want calculate last 3 months transaction like- Present Month total Transaction count Previous two month total Transaction count. Similar logic for amount. like for Jan month only 2 transaction and previous month does not have any transaction so 2 0 0 =2. 3. I want all calculation for Each group of Id.
Please help me achieve my desired output Thanking you in Advanced.
CodePudding user response:
Example
data = [['A', '2022-01-02', 10], ['A', '2022-01-02', 20], ['A', '2022-02-04', 30],
['A', '2022-02-05', 20], ['A', '2022-04-08', 300], ['A', '2022-04-11', 100],
['A', '2022-05-13', 200], ['A', '2022-06-12', 20], ['A', '2022-06-15', 300],
['A', '2022-08-16', 100], ['B', '2022-01-02', 10], ['B', '2022-01-02', 20],
['B', '2022-02-04', 30], ['B', '2022-02-05', 20], ['B', '2022-04-08', 300],
['B', '2022-04-11', 100], ['B', '2022-05-13', 200], ['B', '2022-06-12', 20],
['B', '2022-06-15', 300], ['B', '2022-08-16', 100]]
df1 = pd.DataFrame(data, columns=['Id', 'Date', 'TransAmt'])
df1
Id Date TransAmt
0 A 2022-01-02 10
1 A 2022-01-02 20
2 A 2022-02-04 30
3 A 2022-02-05 20
4 A 2022-04-08 300
5 A 2022-04-11 100
6 A 2022-05-13 200
7 A 2022-06-12 20
8 A 2022-06-15 300
9 A 2022-08-16 100
10 B 2022-01-02 10
11 B 2022-01-02 20
12 B 2022-02-04 30
13 B 2022-02-05 20
14 B 2022-04-08 300
15 B 2022-04-11 100
16 B 2022-05-13 200
17 B 2022-06-12 20
18 B 2022-06-15 300
19 B 2022-08-16 100
Code
s = df1['Date']
df1['Date'] = df1['Date'].astype('Period[M]')
df2 = df1.groupby(['Id', 'Date'])['TransAmt'].agg(['count', sum])
idx1 = pd.period_range(df1['Date'].min(), df1['Date'].max(), freq='M')
idx2 = pd.MultiIndex.from_product([df1['Id'].unique(), idx1])
cols = ['Id', 'Date', 'CountThreeMonth', 'AmountofThreeMonth']
n = 3
df3 = df2.reindex(idx2, fill_value=0).groupby(level=0).rolling(n, min_periods=1).sum().droplevel(0).reset_index().set_axis(cols, axis=1)
df1.merge(df3, how='left').assign(Date=s)
result(df1.merge(df3, how='left').assign(Date=s)
)
Id Date TransAmt CountThreeMonth AmountofThreeMonth
0 A 2022-01-02 10 2.0 30.0
1 A 2022-01-02 20 2.0 30.0
2 A 2022-02-04 30 4.0 80.0
3 A 2022-02-05 20 4.0 80.0
4 A 2022-04-08 300 4.0 450.0
5 A 2022-04-11 100 4.0 450.0
6 A 2022-05-13 200 3.0 600.0
7 A 2022-06-12 20 5.0 920.0
8 A 2022-06-15 300 5.0 920.0
9 A 2022-08-16 100 3.0 420.0
10 B 2022-01-02 10 2.0 30.0
11 B 2022-01-02 20 2.0 30.0
12 B 2022-02-04 30 4.0 80.0
13 B 2022-02-05 20 4.0 80.0
14 B 2022-04-08 300 4.0 450.0
15 B 2022-04-11 100 4.0 450.0
16 B 2022-05-13 200 3.0 600.0
17 B 2022-06-12 20 5.0 920.0
18 B 2022-06-15 300 5.0 920.0
19 B 2022-08-16 100 3.0 420.0
I'm sorry it's hard to explain