Home > OS >  Fill missing dates hourly per group with previous value for only certain conditioned values in certa
Fill missing dates hourly per group with previous value for only certain conditioned values in certa

Time:11-02

I have the following dataframe (sample):

import pandas as pd

data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 12:00:00', True, 3], ['A', '2022-09-01 14:00:00', False, 1], 
        ['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4], ['B', '2022-09-01 18:00:00', False, 3]]
df = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])

  group                 date  indicator  value
0     A  2022-09-01 10:00:00      False      2
1     A  2022-09-01 12:00:00       True      3
2     A  2022-09-01 14:00:00      False      1
3     B  2022-09-01 13:00:00      False      1
4     B  2022-09-01 16:00:00       True      4
5     B  2022-09-01 18:00:00      False      3

I would like to fill in the missing dates hourly per group where the value is the same as the previous existing row. When the previous existing row has an indicator of True, I would like to fill these missing rows with an indicator of False instead of True but the value is although the same. Here is the desired output:

    data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 11:00:00', False, 2], ['A', '2022-09-01 12:00:00', True, 3], ['A', '2022-09-01 13:00:00', False, 3], ['A', '2022-09-01 14:00:00', False, 1], 
            ['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 14:00:00', False, 1], ['B', '2022-09-01 15:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4], ['B', '2022-09-01 17:00:00', False, 4], ['B', '2022-09-01 18:00:00', False, 3]]
    df_desired = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])

   group                 date  indicator  value
0      A  2022-09-01 10:00:00      False      2
1      A  2022-09-01 11:00:00      False      2
2      A  2022-09-01 12:00:00       True      3
3      A  2022-09-01 13:00:00      False      3
4      A  2022-09-01 14:00:00      False      1
5      B  2022-09-01 13:00:00      False      1
6      B  2022-09-01 14:00:00      False      1
7      B  2022-09-01 15:00:00      False      1
8      B  2022-09-01 16:00:00       True      4
9      B  2022-09-01 17:00:00      False      4
10     B  2022-09-01 18:00:00      False      3

As you can see the dates are filled hourly per group and the indicators became False when the previous indicator was True.

So I was wondering if anyone knows how to fill these missing dates hourly per group with keeping care of when the indicator is True using pandas?

CodePudding user response:

First create DatetimeIndex with DataFrame.set_index and then in lambda function add missing hours by DataFrame.asfreq, last replace missing values by Series.fillna and forward filling missing values:

df['date'] = pd.to_datetime(df['date'])

df = (df.set_index('date')
        .groupby('group')[['indicator', 'value']]
        .apply(lambda x: x.asfreq('H'))
        .assign(indicator = lambda x: x['indicator'].fillna(False), 
                value = lambda x: x['value'].ffill())
        .reset_index())
print (df)
   group                date  indicator  value
0      A 2022-09-01 10:00:00      False    2.0
1      A 2022-09-01 11:00:00      False    2.0
2      A 2022-09-01 12:00:00       True    3.0
3      A 2022-09-01 13:00:00      False    3.0
4      A 2022-09-01 14:00:00      False    1.0
5      B 2022-09-01 13:00:00      False    1.0
6      B 2022-09-01 14:00:00      False    1.0
7      B 2022-09-01 15:00:00      False    1.0
8      B 2022-09-01 16:00:00       True    4.0
9      B 2022-09-01 17:00:00      False    4.0
10     B 2022-09-01 18:00:00      False    3.0
  • Related