I have a data frame df that looks like this:
A B C R1 R2
2 2 5 Nan Nan
1 1 8 7 Nan
5 5 10 Nan Nan
1 1 14 Nan Nan
I am trying to update the dataframe where I add B and C and write the result in R1 by offsetting it by 1 row. For example, 2 5 = 7 (offset by a row) and store the result in R1. Then I compare this result with the corresponding value of column C (8). If C>R1, Add C and A and store in R1 by offsetting by 1. If C<R1, add C and B and store in R2 by offsetting by 1. The result should be as like this -
A B C R1 R2
2 2 5 Nan Nan
1 1 8 7 Nan
5 5 10 9 Nan
1 1 14 15 Nan
Nan Nan Nan Nan 15
Is there a way in pandas to do so?
CodePudding user response:
It's better to show your effort to SO users by putting your code in the question. Anyway, You can check this:
import numpy as np
import pandas as pd
df = pd.DataFrame({"A":[2,1,5,1] , "B":[2,1,5,1] , "C":[5,8,10,14] , "R1":[np.nan , np.nan , np.nan, np.nan] , "R2":[np.nan , np.nan , np.nan, np.nan]})
for i in range(len(df)):
if i==0:
df.iloc[i 1 , 3] = df.B[i] df.C[i]
elif i != len(df)-1:
if df.C[i]>df.R1[i]:
df.iloc[i 1 , 3] = df.C[i] df.A[i]
else:
df.iloc[i 1 , 4] = df.C[i] df.B[i]
else:
if df.C[i]>df.R1[i]:
df = df.append({"A":np.nan , "B":np.nan , "C":np.nan , "R1":df.C[i] df.A[i] , "R2":np.nan} , ignore_index=True)
else:
df = df.append({"A":np.nan , "B":np.nan , "C":np.nan , "R1":np.nan , "R2":df.C[i] df.A[i]} , ignore_index=True)
It's not clean code and not straightforward enough, but this can give you the idea to solve the issue. Then you can write it in better shape. and if I call df
, the result will be:
A B C R1 R2
0 2.0 2.0 5.0 NaN NaN
1 1.0 1.0 8.0 7.0 NaN
2 5.0 5.0 10.0 9.0 NaN
3 1.0 1.0 14.0 15.0 NaN
4 NaN NaN NaN NaN 15.0
CodePudding user response:
To present a more instructive example, I took a DataFrame with different values in A and B, and initially without R1 and R2 columns:
A B C
0 4.0 2.0 5.0
1 3.0 1.0 8.0
2 7.0 5.0 10.0
3 4.0 1.0 14.0
To add R1 column, you can run:
df['R1'] = (df.B df.C).shift()
The result so far is:
A B C R1
0 4.0 2.0 5.0 NaN
1 3.0 1.0 8.0 7.0
2 7.0 5.0 10.0 9.0
3 4.0 1.0 14.0 15.0
I noticed that so far you don't want any additional row, resulting from B C for the last row.
The next step is to add a row filled with NaN:
df = pd.concat([df, pd.DataFrame([{'A': np.nan}])], ignore_index=True)
This is needed to provide the space to save the (shifted down) result from the last original row.
The result is:
A B C R1
0 4.0 2.0 5.0 NaN
1 3.0 1.0 8.0 7.0
2 7.0 5.0 10.0 9.0
3 4.0 1.0 14.0 15.0
4 NaN NaN NaN NaN
The next step (If C>R1) can be done as:
df.R1.update(df.R1.mask(df.C > df.R1, df.A df.C).dropna().shift())
The result is:
A B C R1
0 4.0 2.0 5.0 NaN
1 3.0 1.0 8.0 7.0
2 7.0 5.0 10.0 11.0
3 4.0 1.0 14.0 17.0
4 NaN NaN NaN NaN
Generation of R2 column we can start from creation of an empty column:
df = df.assign(R2=np.nan)
Then we save there actual values:
df.R2 = df.R2.mask(df.C < df.R1, df.B df.C).shift()
The result is:
A B C R1 R2
0 4.0 2.0 5.0 NaN NaN
1 3.0 1.0 8.0 7.0 NaN
2 7.0 5.0 10.0 11.0 NaN
3 4.0 1.0 14.0 17.0 15.0
4 NaN NaN NaN NaN 15.0
Note that R2 contains two non-NaN values, since my source data is slightly different to yours.
And the last step is to conditionally drop the last row, if it still contained all NaNs:
df.dropna(how='all', inplace=True)
For the current source data nothing changes.