I'm collecting time series data, but sometimes for some time points there is no data to be collected. Just say for example I am collecting data across four time points, I might get a dataframe like this:
df_ = pd.DataFrame({'group': ['A']*3 ['B']*3,
'time': [1,2,4,1,3,4],
'value': [100,105,111,200,234,222]})
sometimes there is a datapoint missing and so there is no row for that point, I would like groupby and to forward fill with the previous value to create a new row form which would look like this:
df_missing_completed = pd.DataFrame({'group': ['A']*4 ['B']*4,
'time': [1,2,3,4,1,2,3,4],
'value': [100, 101, 105,111,200, 202, 234,222]})
I had the idea that I could create an new dataframe as a template with all the dates and time points, without any values, join it with the real data which would induce NA's, and do a ffill
on the value column to fill in the missing data, like below:
df_template = pd.DataFrame({'group': ['A']*4 ['B']*4,
'time': [1,2,3,4,1,2,3,4]})
df_final = pd.merge(df_template, df_, on = ['group', 'time'], how='left')
df_final['filled_values'] = df_final['value'].fillna(method='ffill')
but this seems like a messy solution, and with the real data the df_templete
will be more complex to create. Does anyone know a better one? Thanks!
CodePudding user response:
I would use:
(df_.pivot(index='time', columns='group', values='value')
# reindex only of you want to add missing times for all groups
.reindex(range(df_['time'].min(), df_['time'].max() 1))
.ffill().unstack().reset_index(name='value')
)
Output:
group time value
0 A 1 100.0
1 A 2 105.0
2 A 3 105.0
3 A 4 111.0
4 B 1 200.0
5 B 2 200.0
6 B 3 234.0
7 B 4 222.0
CodePudding user response:
Instead of a template dataframe you could create a new index and then reindex
with ffill
:
new_idx = pd.MultiIndex.from_product([list('AB'), range(1,5)], names=['group', 'time'])
df_.set_index(['group', 'time']).reindex(new_idx, method='ffill').reset_index()
The result keeps the datatype of the value column:
group time value
0 A 1 100
1 A 2 105
2 A 3 105
3 A 4 111
4 B 1 200
5 B 2 200
6 B 3 234
7 B 4 222