Home > database >  How to select data series with special steps by Dataframe.rolling in pandas?
How to select data series with special steps by Dataframe.rolling in pandas?

Time:10-30

I have a dataframe which looks like this, and index is datetime64 of numpy:

(index)               data
2017-01-01 00:00:00   1
2017-01-01 01:00:00   2
2017-01-01 02:00:00   3
……                    ……
2017-01-04 00:00:00   73
2017-01-04 01:00:00   nan
2017-01-04 02:00:00   75
……                    ……

Now I want to get datas in rolling windows which width are all 72(72 hours) and there is no intersection between two windows such as this:

windows1:
(index)               data
2017-01-01 00:00:00   1
2017-01-01 01:00:00   2
2017-01-01 02:00:00   3
……                    ……
2017-01-03 23:00:00   72

windows2:
(index)               data
2017-01-04 00:00:00   73
# data of 2017-01-04 01:00:00 is nan, removed
2017-01-01 02:00:00   75
……                    ……
2017-01-03 23:00:00   144

So how can realize this by DataFrame.rolling or DataSeries.rolling? If there is no easy answer, I will use index itself to solve the problem.

CodePudding user response:

A 72H rolling can be achieved with df.rolling('72H').sum() (or any other function than sum)

But it looks like you don't want a rolling but rather a groupby with floor:

for k,g in df.groupby(df.index.floor('72H')):
    print(f'New group: {k}\n', g.head(), '\n')

output:

New group: 2016-12-31 00:00:00
                      data
index                    
2017-01-01 00:00:00     1
2017-01-01 01:00:00     2
2017-01-01 02:00:00     3
2017-01-01 03:00:00     4
2017-01-01 04:00:00     5 

New group: 2017-01-03 00:00:00
                      data
index                    
2017-01-03 00:00:00    49
2017-01-03 01:00:00    50
2017-01-03 02:00:00    51
2017-01-03 03:00:00    52
2017-01-03 04:00:00    53 

To compute, for example, the mean:

df.groupby(df.index.floor('72H')).mean()

            data
index           
2016-12-31  24.5
2017-01-03  73.0
alternative
group = (df.index-df.index[0])//pd.Timedelta('72H')

df.groupby(group).mean()

Used input:

df = pd.DataFrame({'index': pd.date_range('2017-01-01', '2017-01-05', freq='1H'),
                   'data': np.arange(1, 98)}).set_index('index')
  • Related