I have a DataFrame as follows. This DataFrame contains NAN values. I want to replace nan values with the earlier non nan value in my DataFrame from previous month(s):
date (y-d-m) | value
2022-01-01 | 1
2022-02-01 | 2
2022-03-01 | 3
2022-04-01 | 4
...
2022-01-02 | nan
2022-02-02 | nan
2022-03-02 | nan
2022-04-02 | nan
...
2022-01-03 | nan
2022-02-03 | nan
2022-03-03 | nan
2022-04-03 | nan
Desired outcome
date (y-d-m) | value
2022-01-01 | 1
2022-02-01 | 2
2022-03-01 | 3
2022-04-01 | 4
...
2022-01-02 | 1
2022-02-02 | 2
2022-03-02 | 3
2022-04-02 | 4
...
2022-01-03 | 1
2022-02-03 | 2
2022-03-03 | 3
2022-04-03 | 4
Data:
{'date (y-d-m)': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01',
'2022-01-02', '2022-02-02', '2022-03-02', '2022-04-02',
'2022-01-03', '2022-02-03', '2022-03-03', '2022-04-03'],
'value': [1.0, 2.0, 3.0, 4.0, nan, nan, nan, nan, nan, nan, nan, nan]}
CodePudding user response:
You could convert "date (y-d-m)"
column to datetime; then groupby
"day" and forward fill with ffill
(values from previous months' same day):
df['date (y-d-m)'] = pd.to_datetime(df['date (y-d-m)'], format='%Y-%d-%m')
df['value'] = df.groupby(df['date (y-d-m)'].dt.day)['value'].ffill()
Output:
date (y-d-m) value
0 2022-01-01 1.0
1 2022-01-02 2.0
2 2022-01-03 3.0
3 2022-01-04 4.0
4 2022-02-01 1.0
5 2022-02-02 2.0
6 2022-02-03 3.0
7 2022-02-04 4.0
8 2022-03-01 1.0
9 2022-03-02 2.0
10 2022-03-03 3.0
11 2022-03-04 4.0