Home > Mobile >  Apply fillna(method='bfill') only if the values in same year and month with Python
Apply fillna(method='bfill') only if the values in same year and month with Python

Time:11-10

Let's say I have a panel dataframe with lots of NaNs inside as follow:

import pandas as pd
import numpy as np

np.random.seed(2021)
dates = pd.date_range('20130226', periods=720)
df = pd.DataFrame(np.random.randint(0, 100, size=(720, 3)), index=dates, columns=list('ABC'))
for col in df.columns:
    df.loc[df.sample(frac=0.4).index, col] = pd.np.nan
df

Out:

               A     B     C
2013-02-26   NaN   NaN   NaN
2013-02-27   NaN   NaN  44.0
2013-02-28  62.0   NaN  29.0
2013-03-01  21.0   NaN  24.0
2013-03-02  12.0  70.0  70.0
          ...   ...   ...
2015-02-11  38.0  42.0   NaN
2015-02-12  67.0   NaN   NaN
2015-02-13  27.0  10.0  74.0
2015-02-14  18.0   NaN   NaN
2015-02-15   NaN   NaN   NaN

I need to apply df.fillna(method='bfill') or df.fillna(method='ffill') to the dataframe only if they are in same year and month:

For example, if I apply df.fillna(method='bfill'), the expected result will like this:

               A     B     C
2013-02-26  62.0   NaN  44.0
2013-02-27  62.0   NaN  44.0
2013-02-28  62.0   NaN  29.0
2013-03-01  21.0  70.0  24.0
2013-03-02  12.0  70.0  70.0
          ...   ...   ...
2015-02-11  38.0  42.0  74.0
2015-02-12  67.0  10.0  74.0
2015-02-13  27.0  10.0  74.0
2015-02-14  18.0   NaN   NaN
2015-02-15   NaN   NaN   NaN

How could I do that in Pandas? Thanks.

CodePudding user response:

You could resample by M (month) and transform bfill:

>>> df.resample("M").transform('bfill')
               A     B     C
2013-02-26  62.0   NaN  44.0
2013-02-27  62.0   NaN  44.0
2013-02-28  62.0   NaN  29.0
2013-03-01  21.0  70.0  24.0
2013-03-02  12.0  70.0  70.0
...          ...   ...   ...
2015-02-11  38.0  42.0  74.0
2015-02-12  67.0  10.0  74.0
2015-02-13  27.0  10.0  74.0
2015-02-14  18.0   NaN   NaN
2015-02-15   NaN   NaN   NaN

[720 rows x 3 columns]
>>> 

For specific columns:

>>> df[['A', 'B']] = df.resample("M")[['A', 'B']].transform('bfill')
>>> df
               A     B     C
2013-02-26  62.0   NaN   NaN
2013-02-27  62.0   NaN  44.0
2013-02-28  62.0   NaN  29.0
2013-03-01  21.0  70.0  24.0
2013-03-02  12.0  70.0  70.0
...          ...   ...   ...
2015-02-11  38.0  42.0   NaN
2015-02-12  67.0  10.0   NaN
2015-02-13  27.0  10.0  74.0
2015-02-14  18.0   NaN   NaN
2015-02-15   NaN   NaN   NaN

[720 rows x 3 columns]
>>> 
  • Related