Home > Blockchain >  Calculate sum of multiples rows with a value in another column, and store the new value in a new col
Calculate sum of multiples rows with a value in another column, and store the new value in a new col

Time:08-25

I have a time series data where I would like to calculate sum of 4 quarters and divide the result with a value in another column. Here is my data example. In reality I will have 10K rows.

df = pd.DataFrame({'c1':[20,15,10,12,15,17,19,20,9,10],'c2':[22,14,19,13,15,17,19,10,20,12]})
date_array=[[2019,12,31],[2019,9,30],[2019,6,30],[2019,3,31],[2018,12,31],[2018,9,30],[2018,6,30],[2018,3,31],[2017,12,31],[2017,9,30]]
date_df = pd.DataFrame(date_array, columns=['year', 'month', 'day'])
df['date'] = pd.to_datetime(date_df[['year', 'month', 'day']], format='%Y-%m-%d')
df = df[['date','c1','c2']]

enter image description here

Now, I would like to create c3 column to store calculated values between c1 and c2. The expecting result will be

enter image description here

In first row, c3 = 2.59 comes from sum of first 4 rows in c1 (20 15 10 12) divided by first row in c2 (22)

In second row, c3 = 3.71 comes from sum of next 4 rows in c1 (15 10 12 15) divided by second row in c2 (14)

and so on..

Toward the end of dataframe, if we have less than 4 rows to calculate, just return 0.

Can anyone please guide me. I try to pick only 4 rows out but it doesn't work already.

for i in range(len(df)):
    print(df.loc[i:i 3, "c1"])

CodePudding user response:

In your case just do rolling

df['new'] = df['c1'].iloc[::-1].rolling(4).sum()/df['c2']
Out[993]: 
0    2.59090909
1    3.71428571
2    2.84210526
3    4.84615385
4    4.73333333
5    3.82352941
6    3.05263158
7           NaN
8           NaN
9           NaN
dtype: float64
  • Related