I have a dataframe in python pandas with date and time. I would like to assign an integer according to the predefined interval, for instance:
Name Date Time
F 01/01/22 23:50:00
F1 01/01/22 22:00:00
F2 01/01/22 20:00:00
F3 01/01/22 19:00:00
F4 01/01/22 18:00:00
F5 01/01/22 17:00:00
F6 01/01/22 16:00:00
I would like to obtain:
Name Date Time Interval
F 01/01/22 23:50:00 1
F1 01/01/22 22:00:00 1
F2 01/01/22 20:00:00 2
F3 01/01/22 19:00:00 2
F4 01/01/22 18:00:00 3
F5 01/01/22 17:00:00 3
F6 01/01/22 16:00:00 4
The interval should have an integer every 2 consecutive hours. The logic would be to have an integer with an hour in range [from 00:00:00 to 02:00:00]=1, [from 02:00:01 to 04:00:00]=2,[from 04:00:01 to 06:00:00]=3,[from 06:00:01 to 08:00:00]=4,etc...
is it possible in pandas?
Thanks
CodePudding user response:
Use cut
with convert Time
column to hours:
h = pd.to_datetime(df['Time']).dt.hour
df['Interval'] = pd.cut(h, bins=range(0,24,2), include_lowest=True, labels=False) 1
print (df)
Name Date Time Interval
0 F 01/01/22 00:50:00 1
1 F1 01/01/22 01:00:00 1
2 F2 01/01/22 02:00:00 1
3 F3 01/01/22 03:00:00 2
4 F4 01/01/22 04:00:00 2
5 F5 01/01/22 05:00:00 3
6 F6 01/01/22 06:00:00 3
7 F3 01/01/22 07:00:00 4
8 F4 01/01/22 08:00:00 4
9 F5 01/01/22 09:00:00 5
10 F6 01/01/22 10:00:00 5
11 F3 01/01/22 11:00:00 6
12 F4 01/01/22 12:00:00 6
13 F5 01/01/22 17:00:00 9
14 F6 01/01/22 16:00:00 8
Or:
h = pd.to_datetime(df['Time']).dt.hour
df['Interval'] = h.sub(1).clip(lower=0) // 2 1
CodePudding user response:
IIUC, you can use a double groupby:
# group by consecutive hours
g1 = pd.to_datetime(df['Time']).dt.hour.diff().abs().ne(1).cumsum()
# split the consecutive in groups of 2
g2 = g1.groupby(g1).cumcount().floordiv(2)
df['Interval'] = df.groupby([g1, g2]).ngroup().add(1)
output:
Name Date Time Interval
0 F 01/01/22 23:50:00 1
1 F1 01/01/22 22:00:00 1
2 F2 01/01/22 20:00:00 2
3 F3 01/01/22 19:00:00 2
4 F4 01/01/22 18:00:00 3
5 F5 01/01/22 17:00:00 3
6 F6 01/01/22 16:00:00 4