Let's say I have a panel dataframe with lots of NaN
s inside as follow:
import pandas as pd
import numpy as np
np.random.seed(2021)
dates = pd.date_range('20130226', periods=720)
df = pd.DataFrame(np.random.randint(0, 100, size=(720, 3)), index=dates, columns=list('ABC'))
for col in df.columns:
df.loc[df.sample(frac=0.4).index, col] = pd.np.nan
df
Out:
A B C
2013-02-26 NaN NaN NaN
2013-02-27 NaN NaN 44.0
2013-02-28 62.0 NaN 29.0
2013-03-01 21.0 NaN 24.0
2013-03-02 12.0 70.0 70.0
... ... ...
2015-02-11 38.0 42.0 NaN
2015-02-12 67.0 NaN NaN
2015-02-13 27.0 10.0 74.0
2015-02-14 18.0 NaN NaN
2015-02-15 NaN NaN NaN
I need to apply df.fillna(method='bfill')
or df.fillna(method='ffill')
to the dataframe only if they are in same year
and month
:
For example, if I apply df.fillna(method='bfill')
, the expected result will like this:
A B C
2013-02-26 62.0 NaN 44.0
2013-02-27 62.0 NaN 44.0
2013-02-28 62.0 NaN 29.0
2013-03-01 21.0 70.0 24.0
2013-03-02 12.0 70.0 70.0
... ... ...
2015-02-11 38.0 42.0 74.0
2015-02-12 67.0 10.0 74.0
2015-02-13 27.0 10.0 74.0
2015-02-14 18.0 NaN NaN
2015-02-15 NaN NaN NaN
How could I do that in Pandas? Thanks.
CodePudding user response:
You could resample
by M
(month) and transform
bfill
:
>>> df.resample("M").transform('bfill')
A B C
2013-02-26 62.0 NaN 44.0
2013-02-27 62.0 NaN 44.0
2013-02-28 62.0 NaN 29.0
2013-03-01 21.0 70.0 24.0
2013-03-02 12.0 70.0 70.0
... ... ... ...
2015-02-11 38.0 42.0 74.0
2015-02-12 67.0 10.0 74.0
2015-02-13 27.0 10.0 74.0
2015-02-14 18.0 NaN NaN
2015-02-15 NaN NaN NaN
[720 rows x 3 columns]
>>>
For specific columns:
>>> df[['A', 'B']] = df.resample("M")[['A', 'B']].transform('bfill')
>>> df
A B C
2013-02-26 62.0 NaN NaN
2013-02-27 62.0 NaN 44.0
2013-02-28 62.0 NaN 29.0
2013-03-01 21.0 70.0 24.0
2013-03-02 12.0 70.0 70.0
... ... ... ...
2015-02-11 38.0 42.0 NaN
2015-02-12 67.0 10.0 NaN
2015-02-13 27.0 10.0 74.0
2015-02-14 18.0 NaN NaN
2015-02-15 NaN NaN NaN
[720 rows x 3 columns]
>>>