Home > OS >  calculate sum of squares with rows above
calculate sum of squares with rows above

Time:02-11

I have a dataset that looks like this:

Value         Type       X_sq
-1.975767     Weather   
-0.540979     Fruits
-2.359127     Fruits
-2.815604     Corona
-0.929755     Weather

I want to iterate through each row and calculate a sum of squares value for each row above (only if the Type matches). I want to put this value in the X.sq column.

So for example, in the first row, there's nothing above. So only (-1.975767 x -1.975767). In the second row, there's no FRUITS row above it, so it will just be -0.540979 x -0.540979. However, in the third row, when we scan all previous rows, we should find that FRUITS is already there. So we should get the last's FRUIT's ..... X_sq value and calculate a new sum of squares.

Value         Type       X_sq
-1.975767     Weather   -1.975767 * -1.975767    = x
-0.540979     Fruits    -0.540979 * -0.540979    = y
-2.359127     Fruits    y   ( -2.359127 x -2.359127)  
-2.815604     Corona    -2.815604 * -2.815604
-0.929755     Weather   x   (-0.929755 * -0.929755)

What would be an efficient way to do this?

def updateSS(X_sq, X_new):
    return X_sq   X_new**2

CodePudding user response:

Use:

df['X_sq'] = df['Value'].pow(2).groupby(df['Type']).cumsum()
print(df)

# Output
      Value     Type      X_sq
0 -1.975767  Weather  3.903655
1 -0.540979   Fruits  0.292658
2 -2.359127   Fruits  5.858138
3 -2.815604   Corona  7.927626
4 -0.929755  Weather  4.768100

CodePudding user response:

You first build an id for consecutive rows with same type

groupid = (df['Type'] != df['Type'].shift()).cumsum()

You can then group the dataframe on it and again use cumsum on each group:

df['X_sq'] = df.groupby(groupid)['Value'].transform(lambda x: (x*x).cumsum())

You should get as expected:

      Value     Type      X_sq
0 -1.975767  Weather  3.903655
1 -0.540979   Fruits  0.292658
2 -2.359127   Fruits  5.858138
3 -2.815604   Corona  7.927626
4 -0.929755  Weather  0.864444
  • Related