Summarize the Problem:
Basically trying to replicate the Maxifs
function from Excel to Pandas based on unique ID. The raw data is a running total on payments based on unique ID and payment dates. The desired output would be the reverse of running total to see single payment on each payment date by unique ID
Below is the raw data from Excel -
ID | Months_after_origination | Principal_payment | Origination_date |
---|---|---|---|
123 | 0 | 48.32 | 10/1/2021 |
123 | 1 | 75.65 | 10/1/2021 |
123 | 2 | 234.67 | 10/1/2021 |
123 | 3 | 380.37 | 10/1/2021 |
Below is the desired output -
ID | Months_after_origination | Principal_payment | Origination_date | Principal_single |
---|---|---|---|---|
123 | 0 | 48.32 | 10/1/2021 | 48.32 |
123 | 1 | 75.65 | 10/1/2021 | 27.33 |
123 | 2 | 234.67 | 10/1/2021 | 159.02 |
123 | 3 | 380.37 | 10/1/2021 | 145.70 |
What I've tried so far:
Been following the links below and trying to replicate but seems to need more work on where
chain a bit more and hopefully could receive some feedback here.
Link1 Link2
Some codes:
a_df["Principal_single"] = a_df["Principal_payment"].where(a_df["Months_after_origination"] < a_df["Months_after_origination"].values).groupby(a_df["ID"]).transform("max")
It turns out all NaN
in the Principal_single
field and suspect that where
chain doesn't capture the condition. Thanks for the help here.
CodePudding user response:
Here is one way to do it
# group by the ID, then subtract previous value from current value, using shift
df['principal_single'] = (df.groupby(['ID'])[['Principal_payment']]
.transform(lambda x: (x - x.shift(1)) ))
# fill the NaN value with the principal payment using ffill
df[['Principal_payment','principal_single']]= df[['Principal_payment','principal_single']].ffill(axis=1 )
df
ID Months_after_origination Principal_payment Origination_date principal_single
0 123 0 48.32 10/1/2021 48.32
1 123 1 75.65 10/1/2021 27.33
2 123 2 234.67 10/1/2021 159.02
3 123 3 380.37 10/1/2021 145.70