Is there a way in pandas to calculate sum of two values in a row and update the result in the next r-CodePudding

I have a data frame df that looks like this:

A   B   C   R1   R2
            
2   2   5   Nan  Nan
1   1   8   7    Nan
5   5   10  Nan  Nan
1   1   14  Nan  Nan

I am trying to update the dataframe where I add B and C and write the result in R1 by offsetting it by 1 row. For example, 2 5 = 7 (offset by a row) and store the result in R1. Then I compare this result with the corresponding value of column C (8). If C>R1, Add C and A and store in R1 by offsetting by 1. If C<R1, add C and B and store in R2 by offsetting by 1. The result should be as like this -

A   B   C   R1    R2
            
2   2   5   Nan   Nan
1   1   8   7     Nan
5   5   10  9     Nan
1   1   14  15    Nan
Nan Nan Nan Nan   15

Is there a way in pandas to do so?

CodePudding user response：

It's better to show your effort to SO users by putting your code in the question. Anyway, You can check this:

import numpy as np
import pandas as pd

df = pd.DataFrame({"A":[2,1,5,1] , "B":[2,1,5,1] ,  "C":[5,8,10,14] , "R1":[np.nan , np.nan , np.nan, np.nan] , "R2":[np.nan , np.nan , np.nan, np.nan]})


for i in range(len(df)):
    if i==0:
        df.iloc[i 1 , 3] = df.B[i]   df.C[i]
    elif i != len(df)-1:
        if df.C[i]>df.R1[i]:
            df.iloc[i 1 , 3] = df.C[i]   df.A[i]
        
        else:
            df.iloc[i 1 , 4] = df.C[i]   df.B[i]
    else:
        if df.C[i]>df.R1[i]:
            df = df.append({"A":np.nan , "B":np.nan , "C":np.nan , "R1":df.C[i]   df.A[i] , "R2":np.nan} , ignore_index=True)
        
        else:
            df = df.append({"A":np.nan , "B":np.nan , "C":np.nan , "R1":np.nan , "R2":df.C[i]   df.A[i]} , ignore_index=True)

It's not clean code and not straightforward enough, but this can give you the idea to solve the issue. Then you can write it in better shape. and if I call df, the result will be:

     A    B     C    R1    R2
0  2.0  2.0   5.0   NaN   NaN
1  1.0  1.0   8.0   7.0   NaN
2  5.0  5.0  10.0   9.0   NaN
3  1.0  1.0  14.0  15.0   NaN
4  NaN  NaN   NaN   NaN  15.0

CodePudding user response：

To present a more instructive example, I took a DataFrame with different values in A and B, and initially without R1 and R2 columns:

     A    B     C
0  4.0  2.0   5.0
1  3.0  1.0   8.0
2  7.0  5.0  10.0
3  4.0  1.0  14.0

To add R1 column, you can run:

df['R1'] = (df.B   df.C).shift()

The result so far is:

     A    B     C    R1
0  4.0  2.0   5.0   NaN
1  3.0  1.0   8.0   7.0
2  7.0  5.0  10.0   9.0
3  4.0  1.0  14.0  15.0

I noticed that so far you don't want any additional row, resulting from B C for the last row.

The next step is to add a row filled with NaN:

df = pd.concat([df, pd.DataFrame([{'A': np.nan}])], ignore_index=True)

This is needed to provide the space to save the (shifted down) result from the last original row.

The result is:

     A    B     C    R1
0  4.0  2.0   5.0   NaN
1  3.0  1.0   8.0   7.0
2  7.0  5.0  10.0   9.0
3  4.0  1.0  14.0  15.0
4  NaN  NaN   NaN   NaN

The next step (If C>R1) can be done as:

df.R1.update(df.R1.mask(df.C > df.R1, df.A   df.C).dropna().shift())

The result is:

     A    B     C    R1
0  4.0  2.0   5.0   NaN
1  3.0  1.0   8.0   7.0
2  7.0  5.0  10.0  11.0
3  4.0  1.0  14.0  17.0
4  NaN  NaN   NaN   NaN

Generation of R2 column we can start from creation of an empty column:

df = df.assign(R2=np.nan)

Then we save there actual values:

df.R2 = df.R2.mask(df.C < df.R1, df.B   df.C).shift()

The result is:

     A    B     C    R1    R2
0  4.0  2.0   5.0   NaN   NaN
1  3.0  1.0   8.0   7.0   NaN
2  7.0  5.0  10.0  11.0   NaN
3  4.0  1.0  14.0  17.0  15.0
4  NaN  NaN   NaN   NaN  15.0

Note that R2 contains two non-NaN values, since my source data is slightly different to yours.

And the last step is to conditionally drop the last row, if it still contained all NaNs:

df.dropna(how='all', inplace=True)

For the current source data nothing changes.