A question was posted on the link below where one wanted to use a previous row to populate the current row:
In this case, there was only one index, the date.
Now I want to add a second index, employee ID; on the first occurrence of the first index, Index_EmpID, then I would like B to be populated with a value from A. On any subsequent occurrence, I would like the value from the previous row multiplied by the value from the current row.
I have the following data frame:
|Index_EmpID |Index_Date | A | B |
|============|===========|======|=====|
|A123 |2022-01-31 | 1 | NaN |
|A123 |2022-02-28 | 1 | NaN |
|A123 |2022-03-31 | 1.05 | NaN |
|A123 |2022-04-30 | 1 | NaN |
|A567 |2022-01-31 | 1 | NaN |
|A567 |2022-02-28 | 1.05 | NaN |
|A567 |2022-03-31 | 1 | NaN |
|A567 |2022-04-30 | 1.05 | NaN |
I require:
|Index_EmpID |Index_Date | A | B |
|============|===========|======|======|
|A123 |2022-01-31 | 1 | 1 |
|A123 |2022-02-28 | 1 | 1 |
|A123 |2022-03-31 | 1.05 | 1.05 |
|A123 |2022-04-30 | 1 | 1.05 |
|A567 |2022-01-31 | 1 | 1 |
|A567 |2022-02-28 | 1.05 | 1.05 |
|A567 |2022-03-31 | 1 | 1.05 |
|A567 |2022-04-30 | 1.05 |1.1025|
CodePudding user response:
Something like
import numpy as np
df.groupby("Index_EmpID")["A"].agg(np.cumprod).reset_index()
should work.
CodePudding user response:
I think this can be solves with a simple shift().
df["B"] = df.A * df.A.shift(1)
With the note that the first row will remain NaN. You can patch that with df["b"].iloc[0] = df.A.iloc[0]
CodePudding user response:
A solution that uses iterrows
is not as nice a solution as the one that uses groupby
but it follows directly from the description and uses only the most elementary Pandas facilities.
empdf = pd.DataFrame({'Index_EmpID': (['A123']*4 ['A567']*4),
'Index_Date': (['2022-01-31', '2022-02-28',
'2022-03-31', '2022-04-30'] * 2),
'A': [1, 1, 1.05, 1, 1, 1.05, 1, 1.05],
'B': ([np.nan]*8)}
past_id, past_b, bs = None, 1, []
for label, row in empdf.iterrows():
if row['Index_EmpID'] == past_id:
bs.append(past_b * row['A'])
else:
bs.append(row['A'])
past_b = bs[-1]
past_id = row['Index_EmpID']
empdf['B'] = bs