I have a dataframe which looks like this, and index is datetime64
of numpy:
(index) data
2017-01-01 00:00:00 1
2017-01-01 01:00:00 2
2017-01-01 02:00:00 3
…… ……
2017-01-04 00:00:00 73
2017-01-04 01:00:00 nan
2017-01-04 02:00:00 75
…… ……
Now I want to get datas in rolling windows which width are all 72(72 hours) and there is no intersection between two windows such as this:
windows1:
(index) data
2017-01-01 00:00:00 1
2017-01-01 01:00:00 2
2017-01-01 02:00:00 3
…… ……
2017-01-03 23:00:00 72
windows2:
(index) data
2017-01-04 00:00:00 73
# data of 2017-01-04 01:00:00 is nan, removed
2017-01-01 02:00:00 75
…… ……
2017-01-03 23:00:00 144
So how can realize this by DataFrame.rolling
or DataSeries.rolling
? If there is no easy answer, I will use index itself to solve the problem.
CodePudding user response:
A 72H rolling can be achieved with df.rolling('72H').sum()
(or any other function than sum
)
But it looks like you don't want a rolling
but rather a groupby
with floor
:
for k,g in df.groupby(df.index.floor('72H')):
print(f'New group: {k}\n', g.head(), '\n')
output:
New group: 2016-12-31 00:00:00
data
index
2017-01-01 00:00:00 1
2017-01-01 01:00:00 2
2017-01-01 02:00:00 3
2017-01-01 03:00:00 4
2017-01-01 04:00:00 5
New group: 2017-01-03 00:00:00
data
index
2017-01-03 00:00:00 49
2017-01-03 01:00:00 50
2017-01-03 02:00:00 51
2017-01-03 03:00:00 52
2017-01-03 04:00:00 53
To compute, for example, the mean:
df.groupby(df.index.floor('72H')).mean()
data
index
2016-12-31 24.5
2017-01-03 73.0
alternative
group = (df.index-df.index[0])//pd.Timedelta('72H')
df.groupby(group).mean()
Used input:
df = pd.DataFrame({'index': pd.date_range('2017-01-01', '2017-01-05', freq='1H'),
'data': np.arange(1, 98)}).set_index('index')