Here the following inquiry, plz:
df is populated by over 200 columns like Sup1, ...Supn where seldom at the tail you find 1 or few nans. To avoid loose informations by truncating the whole dataset at the level where all nan disappear, for just such few vacancy it's better "emulate" what's next with a rolling mean.
Date = ['2022-02-23','2022-02-22','2022-02-21','2022-02-18','2022-02-17','2022-02-16','2022-02-15','2022-02-14','2022-02-11','2022-02-10']
df = {'Sup1':['0.5083333253860474','0.49666666984558105','0.5024999976158142', '0.49666666984558105','0.5','0.5133333206176758','0.5174999634424846','0.5416666865348816','0.5333333611488342',nan],
'Sup2':['0.0130000002682209','0.0130000002682209','0.0130000002682209' ,'0.0133333336561918','0.0140000004321336','0.0140000004321336','0.0140000004321336',nan,nan,nan]
, index = Date}
I want the nans only will be substituted by the value obtained by the rolling Mean. Each mean have to be calculated columnwise for every column where is present any nan
I thought to adopt something like that.. I made some unsuitable trials:
for i in range(len(main_df)):
# col = main_df.columns
# m_ema = main_df[col[i]].ewm(span=1).mean()
# main_df[col[i]] = main_df[col[i]].fillna(value=main_df[col[i]].ewm(span=1).mean(), inplace=True)
main_df.iloc[i] = main_df.iloc[i].fillna(value=main_df.iloc[i].ewm(span=1).mean(), inplace=True)
CodePudding user response:
You can do
df.fillna(df.ewm(span=1).mean(),inplace=True)
df
Out[388]:
Sup1 Sup2
2022-02-23 0.5083333253860474 0.0130000002682209
2022-02-22 0.49666666984558105 0.0130000002682209
2022-02-21 0.5024999976158142 0.0130000002682209
2022-02-18 0.49666666984558105 0.0133333336561918
2022-02-17 0.5 0.0140000004321336
2022-02-16 0.5133333206176758 0.0140000004321336
2022-02-15 0.5174999634424846 0.0140000004321336
2022-02-14 0.5416666865348816 0.014
2022-02-11 0.5333333611488342 0.014
2022-02-10 0.533333 0.014
CodePudding user response:
Have you tried out with just emw
without doing the fillna
? I got the values filled without truncating it.
df
Out[26]:
Sup1 Sup2
2022-02-23 0.5083333253860474 0.0130000002682209
2022-02-22 0.49666666984558105 0.0130000002682209
2022-02-21 0.5024999976158142 0.0130000002682209
2022-02-18 0.49666666984558105 0.0133333336561918
2022-02-17 0.5 0.0140000004321336
2022-02-16 0.5133333206176758 0.0140000004321336
2022-02-15 0.5174999634424846 0.0140000004321336
2022-02-14 0.5416666865348816 NaN
2022-02-11 0.5333333611488342 NaN
2022-02-10 NaN NaN
df["Sup1"].ewm(span=1).mean()
Out[28]:
2022-02-23 0.508333
2022-02-22 0.496667
2022-02-21 0.502500
2022-02-18 0.496667
2022-02-17 0.500000
2022-02-16 0.513333
2022-02-15 0.517500
2022-02-14 0.541667
2022-02-11 0.533333
2022-02-10 0.533333
Name: Sup1, dtype: float64
df["Sup2"].ewm(span=1).mean()
Out[29]:
2022-02-23 0.013000
2022-02-22 0.013000
2022-02-21 0.013000
2022-02-18 0.013333
2022-02-17 0.014000
2022-02-16 0.014000
2022-02-15 0.014000
2022-02-14 0.014000
2022-02-11 0.014000
2022-02-10 0.014000
Name: Sup2, dtype: float64
Going on with iterating over all columns, just define a variable with the title of the dataframe.
header=df.columns
for i in header:
print(df[i].ewm(span=1).mean())