Home > database >  Populate the current row based on the prev
Populate the current row based on the prev

Time:07-19

A question was posted on the link below where one wanted to use a previous row to populate the current row:

Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?

In this case, there was only one index, the date.

Now I want to add a second index, employee ID; on the first occurrence of the first index, Index_EmpID, then I would like B to be populated with a value from A. On any subsequent occurrence, I would like the value from the previous row multiplied by the value from the current row.

I have the following data frame:

|Index_EmpID |Index_Date |   A  | B   |
|============|===========|======|=====|
|A123        |2022-01-31 |   1  | NaN |
|A123        |2022-02-28 |   1  | NaN |
|A123        |2022-03-31 | 1.05 | NaN |
|A123        |2022-04-30 |   1  | NaN |
|A567        |2022-01-31 |   1  | NaN |
|A567        |2022-02-28 | 1.05 | NaN |
|A567        |2022-03-31 |   1  | NaN |
|A567        |2022-04-30 | 1.05 | NaN |     

I require:

|Index_EmpID |Index_Date |   A  |  B   |
|============|===========|======|======|
|A123        |2022-01-31 |   1  |  1   |
|A123        |2022-02-28 |   1  |  1   |
|A123        |2022-03-31 | 1.05 | 1.05 |
|A123        |2022-04-30 |   1  | 1.05 |
|A567        |2022-01-31 |   1  |  1   |
|A567        |2022-02-28 | 1.05 | 1.05 |
|A567        |2022-03-31 |   1  | 1.05 |
|A567        |2022-04-30 | 1.05 |1.1025|     

CodePudding user response:

Something like

import numpy as np
df.groupby("Index_EmpID")["A"].agg(np.cumprod).reset_index()

should work.

CodePudding user response:

I think this can be solves with a simple shift().

df["B"] = df.A * df.A.shift(1)

With the note that the first row will remain NaN. You can patch that with df["b"].iloc[0] = df.A.iloc[0]

CodePudding user response:

A solution that uses iterrows is not as nice a solution as the one that uses groupby but it follows directly from the description and uses only the most elementary Pandas facilities.

empdf = pd.DataFrame({'Index_EmpID': (['A123']*4   ['A567']*4),
                      'Index_Date': (['2022-01-31', '2022-02-28',
                                      '2022-03-31', '2022-04-30'] * 2),
                      'A': [1, 1, 1.05, 1, 1, 1.05, 1, 1.05], 
                      'B': ([np.nan]*8)}

past_id, past_b, bs = None, 1, []
for label, row in empdf.iterrows():
     if row['Index_EmpID'] == past_id:
        bs.append(past_b * row['A'])
     else:
        bs.append(row['A'])
     past_b = bs[-1]
     past_id = row['Index_EmpID']
empdf['B'] = bs
  • Related