Home > front end >  Resampling data only if value is present
Resampling data only if value is present

Time:09-21

import pandas as pd

data = {'time': ['09:30:00',
                 '09:31:00',
                 '09:37:00',
                 '09:38:00',
                 '09:39:00',
                 '09:40:00',
                 '09:46:00',
                 '09:47:00',
                 '09:48:00'],
         'sum': [5, 8, 5, 10, 15, 2, 0, 0, 0]}
my_index = pd.MultiIndex.from_arrays([["A"]*6   ["B"]*3, [1, 1, 1, 2, 2, 2, 1, 1, 1]], names=["ID-A", "ID-B"])
df = pd.DataFrame(data, index=my_index)

If I do resampling for every 3 minutes with sum(),i.e.->
data=data.set_index('time').groupby(['ID-A','ID-B']).resample('3min').sum()
There is a window({9:33 - 9:36},{9:42 - 9:45}) for which the sum() comes out to be 0. There are some values in my dataframe which actually evaluate to 0 even after the time windows are available for it(9:45-9:48). I do not want resampling for time windows where there is no data available. I want to find out windows where my sum is 0 but due to resampling I am getting fake 0s in my data as there are no data available for those time.

CodePudding user response:

Use min_count=1 parameter in sum and then remove missing values:

df['time'] = pd.to_timedelta(df['time'])

df = df.resample('3Min', on='time').sum(min_count=1).dropna()
print (df)
                  sum
time                 
0 days 09:30:00  13.0
0 days 09:36:00  15.0
0 days 09:39:00  17.0
0 days 09:45:00   0.0
0 days 09:48:00   0.0

Details:

print (df.resample('3Min', on='time').sum(min_count=1))
                  sum
time                 
0 days 09:30:00  13.0
0 days 09:33:00   NaN
0 days 09:36:00  15.0
0 days 09:39:00  17.0
0 days 09:42:00   NaN
0 days 09:45:00   0.0
0 days 09:48:00   0.0

EDIT: Solution per groups:

df['time'] = pd.to_timedelta(df['time'])

data = (df.reset_index()
          .set_index('time')
          .groupby(['ID-A','ID-B'])['sum']
          .resample('3min')
          .sum(min_count=1)
          .dropna())
print (data)
ID-A  ID-B  time           
A     1     0 days 09:30:00    13.0
            0 days 09:36:00     5.0
      2     0 days 09:38:00    27.0
B     1     0 days 09:46:00     0.0
Name: sum, dtype: float64
  • Related