Home > OS >  how to delete NaN values in pandas while adding other columns' value to the very next row which
how to delete NaN values in pandas while adding other columns' value to the very next row which

Time:08-20

Here is my dataframe view:

index   date    B   N   S   y_B   y_N   y_S  f_price    y_price  p_change   price_day_t-2
1   2021-01-12  1   29  0   2.0   57.0  0.0  10250.0    10760.0  -1.0        11060.0
2   2021-01-13  0   67  0   1.0   29.0  0.0  9810.0     10250.0  -1.0        10760.0
3   2021-01-14  2   19  0   0.0   67.0  0.0  NaN         NaN     NaN           NaN
4   2021-01-15  1   6   0   2.0   19.0  0.0  NaN         NaN     NaN           NaN
5   2021-01-16  2   46  0   1.0   6.0   0.0  9340.0     9810.0   -1.0        10250.0
6   2021-01-17  3   22  0   2.0   46.0  0.0  NaN         NaN     NaN           NaN
7   2021-01-18  1   34  0   3.0   22.0  0.0  8890.0     9340.0   -1.0        9810.0

I want to delete rows with null values but at the same time, I want to add the other columns' values to the very next row without null values. which the result would be like this:

index   date    B   N   S   y_B   y_N   y_S  f_price    y_price  p_change   price_day_t-2
1   2021-01-12  1   29  0   2.0   57.0  0.0  10250.0    10760.0  -1.0        11060.0
2   2021-01-13  0   67  0   1.0   29.0  0.0  9810.0     10250.0  -1.0        10760.0
5   2021-01-16  5   71  0   3.0   92.0  0.0  9340.0     9810.0   -1.0        10250.0
7   2021-01-18  4   56  0   5.0   68.0  0.0  8890.0     9340.0   -1.0        9810.0

for example in row n.5 column "B", the result is 5 because of the aggregation 2 1 2. (the two other values [1] and [2] are from deleted rows)

How can I do this?

CodePudding user response:

You can create groups by comapre last column from back for non missing values with cumulative sum, then aggregate sum for all columns without date, it aggregate by last values per groups:

g = df.iloc[::-1, -1].notna().cumsum()

d = dict.fromkeys(df.columns, 'sum')
d['date'] = 'last'

df = df.groupby(g, sort=False).agg(d).reset_index(drop=True)
print (df)
         date  B   N  S  y_B   y_N  y_S  f_price  y_price  p_change  \
0  2021-01-12  1  29  0  2.0  57.0  0.0  10250.0  10760.0      -1.0   
1  2021-01-13  0  67  0  1.0  29.0  0.0   9810.0  10250.0      -1.0   
2  2021-01-16  5  71  0  3.0  92.0  0.0   9340.0   9810.0      -1.0   
3  2021-01-18  4  56  0  5.0  68.0  0.0   8890.0   9340.0      -1.0   

   price_day_t-2  
0        11060.0  
1        10760.0  
2        10250.0  
3         9810.0  
  • Related