Home > Net >  Add missing dates do datetime column in Pandas using last value
Add missing dates do datetime column in Pandas using last value

Time:06-14

I've already checked out Add missing dates to pandas dataframe, but I don't want to fill in the new dates with a generic value.

My dataframe looks more or less like this:

date (dd/mm/yyyy) value
01/01/2000 a
02/01/2000 b
03/01/2000 c
06/01/2000 d

So in this example, days 04/01/2000 and 05/01/2000 are missing. What I want to do is to insert them before the 6th, with a value of c, the last value before the missing days. So the "correct" df should look like:

date (dd/mm/yyyy) value
01/01/2000 a
02/01/2000 b
03/01/2000 c
04/01/2000 c
05/01/2000 c
06/01/2000 d

There are multiple instances of missing dates, and it's a large df (~9000 rows).

Thanks for your time! :)

CodePudding user response:

Assuming that your dates are drawn at a regular frequency, you can generate a pd.DateIndex with date_range, filter those which are not in your date column, crate a dataframe to concatenate with nan in the value column and fillna using the back or forward fill method.


# assuming your dataframe is df:

all_dates = pd.date_range(start=df.date.min(), end=df.date.max(), freq='M')
known_dates = set(df.date.to_list()) # set is blazing fast on `in` compared with a list.
unknown_dates = all_dates[~all_dates.isin(known_dates)]
df2 = pd.DateFrame({'date': unknown_dates})
df2['value'] = np.nan
df = pd.concat([df, df2])
df = df.sort_values('value').fillna(method='ffill')

CodePudding user response:

try this:

# If your DataFrame's 'date (dd/mm/yyyy)' field is not a datetime object
df['date (dd/mm/yyyy)'] = pd.to_datetime(df['date (dd/mm/yyyy)'])
out = df.set_index('date (dd/mm/yyyy)').asfreq('MS', method='ffill').reset_index()
print(out)
>>>

date (dd/mm/yyyy)   value
0   2000-01-01      a
1   2000-02-01      b
2   2000-03-01      c
3   2000-04-01      c
4   2000-05-01      c
5   2000-06-01      d
  • Related