I have a data frame like this...
Customer | Date | Balance |
---|---|---|
Adam | 01/01/2022 | 10 |
Adam | 01/01/2022 | 20 |
Adam | 03/01/2022 | 30 |
Molly | 01/01/2022 | 40 |
Molly | 01/01/2022 | 50 |
Molly | 03/01/2022 | 60 |
I want to find daily day-end balance for each customer. I tried groupby with resample method but couldn't do any. For both customer 2nd of January is missing, so my function should create a row for 2nd January and populat it with previous day end balance. My output should be like this-
Customer | Date | Balance |
---|---|---|
Adam | 01/01/2022 | 20 |
Adam | 02/01/2022 | 20 |
Adam | 03/01/2022 | 30 |
Molly | 01/01/2022 | 50 |
Molly | 02/01/2022 | 50 |
Molly | 03/01/2022 | 60 |
CodePudding user response:
IIUC, you could do:
df2 = (
df.assign(Date=pd.to_datetime(df['Date'], dayfirst=True))
.groupby(['Customer', 'Date'], sort=False).last()
)
dt = df2.index.get_level_values('Date')
idx = pd.MultiIndex.from_product([df2.index.get_level_values('Customer').unique(),
pd.date_range(dt.min(), dt.max(), freq='D')
], names=['Customer', 'Date'])
df2 = df2.reindex(idx).groupby(level='Customer', sort=False).ffill().reset_index()
output:
Customer Date Balance
0 Adam 2022-01-01 20.0
1 Adam 2022-01-02 20.0
2 Adam 2022-01-03 30.0
3 Molly 2022-01-01 50.0
4 Molly 2022-01-02 50.0
5 Molly 2022-01-03 60.0
6 John 2022-01-01 NaN
7 John 2022-01-02 15.0
8 John 2022-01-03 15.0