I have the following dataframe (sample):
import pandas as pd
data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 14:00:00', False, 3],
['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4]]
df = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])
group date indicator value
0 A 2022-09-01 10:00:00 False 2
1 A 2022-09-01 14:00:00 False 3
2 B 2022-09-01 13:00:00 False 1
3 B 2022-09-01 16:00:00 True 4
I would like to fill in the missing dates between dates hourly. So each hour that is missing between dates should be filled and the values should be the same as the previous data. Here is the desired output:
data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 11:00:00', False, 2],
['A', '2022-09-01 12:00:00', False, 2], ['A', '2022-09-01 13:00:00', False, 2],
['A', '2022-09-01 14:00:00', False, 3],
['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 14:00:00', False, 1],
['B', '2022-09-01 15:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])
group date indicator value
0 A 2022-09-01 10:00:00 False 2
1 A 2022-09-01 11:00:00 False 2
2 A 2022-09-01 12:00:00 False 2
3 A 2022-09-01 13:00:00 False 2
4 A 2022-09-01 14:00:00 False 3
5 B 2022-09-01 13:00:00 False 1
6 B 2022-09-01 14:00:00 False 1
7 B 2022-09-01 15:00:00 False 1
8 B 2022-09-01 16:00:00 True 4
So I was wondering if it is possible to fill the missing dates hourly per group with the previous value in column value using Pandas
?
CodePudding user response:
You can use:
df['date'] = pd.to_datetime(df['date'])
out = (df
.groupby('group', as_index=False, group_keys=False)
.apply(lambda g: g.set_index('date')
.reindex(pd.date_range(g['date'].min(),
g['date'].max(),
freq='H'))
.ffill(downcast='infer').reset_index()
)
.reset_index(drop=True)
)
output:
index group indicator value
0 2022-09-01 10:00:00 A False 2
1 2022-09-01 11:00:00 A False 2
2 2022-09-01 12:00:00 A False 2
3 2022-09-01 13:00:00 A False 2
4 2022-09-01 14:00:00 A False 3
5 2022-09-01 13:00:00 B False 1
6 2022-09-01 14:00:00 B False 1
7 2022-09-01 15:00:00 B False 1
8 2022-09-01 16:00:00 B True 4
CodePudding user response:
here is one other way about it
df['date']=pd.to_datetime(df['date'])
df2=(df.set_index('date' )
.groupby('group', group_keys=False)
.apply(lambda x: x.resample('1H').ffill())
.reset_index() )
df2
date group indicator value
0 2022-09-01 10:00:00 A False 2
1 2022-09-01 11:00:00 A False 2
2 2022-09-01 12:00:00 A False 2
3 2022-09-01 13:00:00 A False 2
4 2022-09-01 14:00:00 A False 3
5 2022-09-01 13:00:00 B False 1
6 2022-09-01 14:00:00 B False 1
7 2022-09-01 15:00:00 B False 1
8 2022-09-01 16:00:00 B True 4