Home > Blockchain >  Add missing dates into TimeSeries DataFrame with 'NaN' values for existing columns
Add missing dates into TimeSeries DataFrame with 'NaN' values for existing columns

Time:06-25

I have a DataFrame object representing a time series indexed in minute resolution. There are segments in which there are no entries, which have been identified as follows:

missing = pd.date_range(df.index.min(), df.index.max(), freq='1min').difference(df.index)

Doing a quick print of this results in the following output:

DatetimeIndex(['2021-02-11 03:41:00', '2021-02-11 03:42:00',
           '2021-02-11 03:43:00', '2021-02-11 03:44:00',
           '2021-02-11 03:45:00', '2021-02-11 03:46:00',
           '2021-02-11 03:47:00', '2021-02-11 03:48:00',
           '2021-02-11 03:49:00', '2021-02-11 03:50:00',
           ...
           '2021-09-29 08:50:00', '2021-09-29 08:51:00',
           '2021-09-29 08:52:00', '2021-09-29 08:53:00',
           '2021-09-29 08:54:00', '2021-09-29 08:55:00',
           '2021-09-29 08:56:00', '2021-09-29 08:57:00',
           '2021-09-29 08:58:00', '2021-09-29 08:59:00'],
          dtype='datetime64[ns]', length=5057, freq=None)

My question is how to insert each of these values as a new row into the initial DataFrame object with NaN values for all columns existing in that initial dataframe as such:

date a b c d
2021-12-31 23:59:00 3675.10 3678.63 3675.09 3676.23
2021-12-31 23:58:00 Nan Nan Nan Nan
2021-12-31 23:57:00 Nan Nan Nan Nan
2021-12-31 23:56:00 3676.79 3678.54 3676.32 3676.94
2021-12-31 23:55:00 3679.36 3679.36 3676.28 3676.65

Where, in this contrived example, the dates 2021-12-31 23:58:00 and 2021-12-31 23:57:00 would have been identified via the date_range function before and then inserted with NaN values into the initial DataFrame. In this case, I am not interested in interpolating between surrounding values.

CodePudding user response:

Use resample!

df.resample('1T').fillna(method=None)

Note: '1T' stands for 1 minute

And here some code that works, recreating the data frame from your sample data. But please next time, provide this yourself. You will get better and faster answers!

import pandas
from io import StringIO


s = """date  a  b  c  d
2021-12-31 23:59:00  3675.10  3678.63  3675.09  3676.23
2021-12-31 23:56:00  3676.79  3678.54  3676.32  3676.94
2021-12-31 23:55:00  3679.36  3679.36  3676.28  3676.65"""


csvStringIO = StringIO(s)

df = pandas.read_csv(csvStringIO,  sep='  ', index_col='date')
df.index = pandas.to_datetime(df.index)

df = df.sort_index()

df.resample('1T').fillna(method=None)

Output:

                           a        b        c        d
date                                                   
2021-12-31 23:55:00  3679.36  3679.36  3676.28  3676.65
2021-12-31 23:56:00  3676.79  3678.54  3676.32  3676.94
2021-12-31 23:57:00      NaN      NaN      NaN      NaN
2021-12-31 23:58:00      NaN      NaN      NaN      NaN
2021-12-31 23:59:00  3675.10  3678.63  3675.09  3676.23

Edit: Your index is unsorted. I had to add the line with index_sort(). Keep that in mind.

  • Related