I have a DataFrame object representing a time series indexed in minute resolution. There are segments in which there are no entries, which have been identified as follows:
missing = pd.date_range(df.index.min(), df.index.max(), freq='1min').difference(df.index)
Doing a quick print of this results in the following output:
DatetimeIndex(['2021-02-11 03:41:00', '2021-02-11 03:42:00',
'2021-02-11 03:43:00', '2021-02-11 03:44:00',
'2021-02-11 03:45:00', '2021-02-11 03:46:00',
'2021-02-11 03:47:00', '2021-02-11 03:48:00',
'2021-02-11 03:49:00', '2021-02-11 03:50:00',
...
'2021-09-29 08:50:00', '2021-09-29 08:51:00',
'2021-09-29 08:52:00', '2021-09-29 08:53:00',
'2021-09-29 08:54:00', '2021-09-29 08:55:00',
'2021-09-29 08:56:00', '2021-09-29 08:57:00',
'2021-09-29 08:58:00', '2021-09-29 08:59:00'],
dtype='datetime64[ns]', length=5057, freq=None)
My question is how to insert each of these values as a new row into the initial DataFrame object with NaN
values for all columns existing in that initial dataframe as such:
date | a | b | c | d |
---|---|---|---|---|
2021-12-31 23:59:00 | 3675.10 | 3678.63 | 3675.09 | 3676.23 |
2021-12-31 23:58:00 | Nan | Nan | Nan | Nan |
2021-12-31 23:57:00 | Nan | Nan | Nan | Nan |
2021-12-31 23:56:00 | 3676.79 | 3678.54 | 3676.32 | 3676.94 |
2021-12-31 23:55:00 | 3679.36 | 3679.36 | 3676.28 | 3676.65 |
Where, in this contrived example, the dates 2021-12-31 23:58:00
and 2021-12-31 23:57:00
would have been identified via the date_range
function before and then inserted with NaN
values into the initial DataFrame. In this case, I am not interested in interpolating between surrounding values.
CodePudding user response:
Use resample!
df.resample('1T').fillna(method=None)
Note: '1T' stands for 1 minute
And here some code that works, recreating the data frame from your sample data. But please next time, provide this yourself. You will get better and faster answers!
import pandas
from io import StringIO
s = """date a b c d
2021-12-31 23:59:00 3675.10 3678.63 3675.09 3676.23
2021-12-31 23:56:00 3676.79 3678.54 3676.32 3676.94
2021-12-31 23:55:00 3679.36 3679.36 3676.28 3676.65"""
csvStringIO = StringIO(s)
df = pandas.read_csv(csvStringIO, sep=' ', index_col='date')
df.index = pandas.to_datetime(df.index)
df = df.sort_index()
df.resample('1T').fillna(method=None)
Output:
a b c d
date
2021-12-31 23:55:00 3679.36 3679.36 3676.28 3676.65
2021-12-31 23:56:00 3676.79 3678.54 3676.32 3676.94
2021-12-31 23:57:00 NaN NaN NaN NaN
2021-12-31 23:58:00 NaN NaN NaN NaN
2021-12-31 23:59:00 3675.10 3678.63 3675.09 3676.23
Edit: Your index is unsorted. I had to add the line with index_sort()
. Keep that in mind.