Here is my dataframe view:
index date B N S y_B y_N y_S f_price y_price p_change price_day_t-2
1 2021-01-12 1 29 0 2.0 57.0 0.0 10250.0 10760.0 -1.0 11060.0
2 2021-01-13 0 67 0 1.0 29.0 0.0 9810.0 10250.0 -1.0 10760.0
3 2021-01-14 2 19 0 0.0 67.0 0.0 NaN NaN NaN NaN
4 2021-01-15 1 6 0 2.0 19.0 0.0 NaN NaN NaN NaN
5 2021-01-16 2 46 0 1.0 6.0 0.0 9340.0 9810.0 -1.0 10250.0
6 2021-01-17 3 22 0 2.0 46.0 0.0 NaN NaN NaN NaN
7 2021-01-18 1 34 0 3.0 22.0 0.0 8890.0 9340.0 -1.0 9810.0
I want to delete rows with null values but at the same time, I want to add the other columns' values to the very next row without null values. which the result would be like this:
index date B N S y_B y_N y_S f_price y_price p_change price_day_t-2
1 2021-01-12 1 29 0 2.0 57.0 0.0 10250.0 10760.0 -1.0 11060.0
2 2021-01-13 0 67 0 1.0 29.0 0.0 9810.0 10250.0 -1.0 10760.0
5 2021-01-16 5 71 0 3.0 92.0 0.0 9340.0 9810.0 -1.0 10250.0
7 2021-01-18 4 56 0 5.0 68.0 0.0 8890.0 9340.0 -1.0 9810.0
for example in row n.5 column "B", the result is 5 because of the aggregation 2 1 2. (the two other values [1] and [2] are from deleted rows)
How can I do this?
CodePudding user response:
You can create groups by comapre last column from back for non missing values with cumulative sum, then aggregate sum
for all columns without date
, it aggregate by last
values per groups:
g = df.iloc[::-1, -1].notna().cumsum()
d = dict.fromkeys(df.columns, 'sum')
d['date'] = 'last'
df = df.groupby(g, sort=False).agg(d).reset_index(drop=True)
print (df)
date B N S y_B y_N y_S f_price y_price p_change \
0 2021-01-12 1 29 0 2.0 57.0 0.0 10250.0 10760.0 -1.0
1 2021-01-13 0 67 0 1.0 29.0 0.0 9810.0 10250.0 -1.0
2 2021-01-16 5 71 0 3.0 92.0 0.0 9340.0 9810.0 -1.0
3 2021-01-18 4 56 0 5.0 68.0 0.0 8890.0 9340.0 -1.0
price_day_t-2
0 11060.0
1 10760.0
2 10250.0
3 9810.0