I have a following problem. I have this df with 10Min interval:
df_dict = {"value" : [1, 1, 2, 3], "datetime" : ["2022-09-05 07:20:00", "2022-09-05 07:30:00", "2022-09-05 07:20:00", "2022-09-05 07:20:00"],
"expedice" : ["A", "A", "B", "C"] }
df = pd.DataFrame(df_dict)
I would like to fill missing datetime to have:
df_dict = {"value" : [1, 1, 2, 0, 3, 0], "datetime" : ["2022-09-05 07:20:00", "2022-09-05 07:30:00", "2022-09-05 07:20:00", "2022-09-05 07:30:00", "2022-09-05 07:20:00", "2022-09-05 07:30:00"],
"expedice" : ["A", "A", "B", "B", "C", "C"] }
df = pd.DataFrame(df_dict)
I tried
df.datetime = pd.to_datetime(df.datetime)
df.set_index(
['datetime', 'expedice']
).unstack(
fill_value=0
).asfreq(
"10Min", fill_value=0
).stack().sort_index(level=1).reset_index()
But I got an error TypeError: Cannot change data-type for object array.
. How can I fix it please?
CodePudding user response:
Use DataFrame.reindex
with DatetimeIndex created by minimal and maximal datetime:
df1 = df.set_index(['expedice', 'datetime'])
df1 = (df1.reindex(pd.MultiIndex.from_product([df1.index.levels[0],
pd.date_range(df1.index.levels[1].min(),
df1.index.levels[1].max(),
freq='10Min')],
names=df1.index.names), fill_value=0)
.reset_index())
print (df1)
expedice datetime value
0 A 2022-09-05 07:20:00 1
1 A 2022-09-05 07:30:00 1
2 B 2022-09-05 07:20:00 2
3 B 2022-09-05 07:30:00 0
4 C 2022-09-05 07:20:00 3
5 C 2022-09-05 07:30:00 0