How substitute calculated values on Nans-CodePudding

Here the following inquiry, plz:

df is populated by over 200 columns like Sup1, ...Supn where seldom at the tail you find 1 or few nans. To avoid loose informations by truncating the whole dataset at the level where all nan disappear, for just such few vacancy it's better "emulate" what's next with a rolling mean.

Date = ['2022-02-23','2022-02-22','2022-02-21','2022-02-18','2022-02-17','2022-02-16','2022-02-15','2022-02-14','2022-02-11','2022-02-10']
df = {'Sup1':['0.5083333253860474','0.49666666984558105','0.5024999976158142', '0.49666666984558105','0.5','0.5133333206176758','0.5174999634424846','0.5416666865348816','0.5333333611488342',nan],
      'Sup2':['0.0130000002682209','0.0130000002682209','0.0130000002682209' ,'0.0133333336561918','0.0140000004321336','0.0140000004321336','0.0140000004321336',nan,nan,nan]
     , index = Date}

I want the nans only will be substituted by the value obtained by the rolling Mean. Each mean have to be calculated columnwise for every column where is present any nan

I thought to adopt something like that.. I made some unsuitable trials:

for i in range(len(main_df)):
  # col = main_df.columns
  # m_ema = main_df[col[i]].ewm(span=1).mean()
  # main_df[col[i]] = main_df[col[i]].fillna(value=main_df[col[i]].ewm(span=1).mean(), inplace=True)
  main_df.iloc[i] = main_df.iloc[i].fillna(value=main_df.iloc[i].ewm(span=1).mean(), inplace=True)

CodePudding user response：

You can do

df.fillna(df.ewm(span=1).mean(),inplace=True)
df
Out[388]: 
                           Sup1                Sup2
2022-02-23   0.5083333253860474  0.0130000002682209
2022-02-22  0.49666666984558105  0.0130000002682209
2022-02-21   0.5024999976158142  0.0130000002682209
2022-02-18  0.49666666984558105  0.0133333336561918
2022-02-17                  0.5  0.0140000004321336
2022-02-16   0.5133333206176758  0.0140000004321336
2022-02-15   0.5174999634424846  0.0140000004321336
2022-02-14   0.5416666865348816               0.014
2022-02-11   0.5333333611488342               0.014
2022-02-10             0.533333               0.014

CodePudding user response：

Have you tried out with just emw without doing the fillna? I got the values filled without truncating it.

df
Out[26]: 
                           Sup1                Sup2
2022-02-23   0.5083333253860474  0.0130000002682209
2022-02-22  0.49666666984558105  0.0130000002682209
2022-02-21   0.5024999976158142  0.0130000002682209
2022-02-18  0.49666666984558105  0.0133333336561918
2022-02-17                  0.5  0.0140000004321336
2022-02-16   0.5133333206176758  0.0140000004321336
2022-02-15   0.5174999634424846  0.0140000004321336
2022-02-14   0.5416666865348816                 NaN
2022-02-11   0.5333333611488342                 NaN
2022-02-10                  NaN                 NaN

df["Sup1"].ewm(span=1).mean()
Out[28]: 
2022-02-23    0.508333
2022-02-22    0.496667
2022-02-21    0.502500
2022-02-18    0.496667
2022-02-17    0.500000
2022-02-16    0.513333
2022-02-15    0.517500
2022-02-14    0.541667
2022-02-11    0.533333
2022-02-10    0.533333
Name: Sup1, dtype: float64

df["Sup2"].ewm(span=1).mean()
Out[29]: 
2022-02-23    0.013000
2022-02-22    0.013000
2022-02-21    0.013000
2022-02-18    0.013333
2022-02-17    0.014000
2022-02-16    0.014000
2022-02-15    0.014000
2022-02-14    0.014000
2022-02-11    0.014000
2022-02-10    0.014000
Name: Sup2, dtype: float64

Going on with iterating over all columns, just define a variable with the title of the dataframe.

header=df.columns
for i in header:
    print(df[i].ewm(span=1).mean())