I was looking for similar topics, but I found only change by month. My problem is that I would like to have a month change in value e.g. UPL but per user like in the below example.
user_id | month | UPL |
---|---|---|
1 | 2022-01-01 00:00:00 | 100 |
1 | 2022-02-01 00:00:00 | 200 |
2 | 2022-01-01 00:00:00 | 100 |
2 | 2022-02-01 00:00:00 | 50 |
1 | 2022-03-01 00:00:00 | 150 |
And to have additional column named "UPL change month by month":
user_id | month | UPL | UPL_change_by_month |
---|---|---|---|
1 | 2022-01-01 00:00:00 | 100 | 0 |
1 | 2022-02-01 00:00:00 | 200 | 100 |
2 | 2022-01-01 00:00:00 | 100 | 0 |
2 | 2022-02-01 00:00:00 | 50 | -50 |
1 | 2022-03-01 00:00:00 | 150 | -50 |
Is it possible using aggfunc or shift function using Pandas?
CodePudding user response:
IIUC, you can use groupby_diff
:
df['UPL_change_by_month'] = df.sort_values('month').groupby('user_id')['UPL'].diff().fillna(0)
print(df)
# Output
user_id month UPL UPL_change_by_month
0 1 2022-01-01 100 0.0
1 1 2022-02-01 200 100.0
2 2 2022-01-01 100 0.0
3 2 2022-02-01 50 -50.0
4 1 2022-03-01 150 -50.0