I want to get the time difference in the between below two columns. then compare the time difference hours to certain ranges. i.e if hours > 24 then x, if hours > 8 then y, if hours > 1 the z, and finally if hours <= 1 then done.
I have both columuns as below
df = pd.DataFrame({'Start': {0: '2022-05-01 14:32:07',
1: '2022-05-01 13:09:20',
2: '2022-05-01 17:19:12',
3: '2022-05-01 16:12:03',
4: '2022-05-01 13:24:42',
5: '2022-05-01 02:24:17',
6: '2022-05-01 12:53:35',
7: '2022-05-01 09:57:34',
8: '2022-05-01 17:24:29',
9: '2022-05-01 09:58:15'},
'End': {0: '2022-05-01 15:39:51',
1: '2022-05-02 13:10:56',
2: '2022-05-01 21:29:04',
3: '2022-05-01 23:56:27',
4: '2022-05-03 13:28:05',
5: '2022-05-01 16:24:18',
6: '2022-05-01 12:55:42',
7: '2022-05-01 19:57:35',
8: '2022-05-01 17:30:06',
9: '2022-05-01 19:58:16'}})
df = df.apply(pd.to_datetime) # Just as imported data already in datetime format
Output:
Start End
0 2022-05-01 14:32:07 2022-05-01 15:39:51
1 2022-05-01 13:09:20 2022-05-02 13:10:56
2 2022-05-01 17:19:12 2022-05-01 21:29:04
3 2022-05-01 16:12:03 2022-05-01 23:56:27
4 2022-05-01 13:24:42 2022-05-03 13:28:05
5 2022-05-01 02:24:17 2022-05-01 16:24:18
6 2022-05-01 12:53:35 2022-05-01 12:55:42
7 2022-05-01 09:57:34 2022-05-01 19:57:35
8 2022-05-01 17:24:29 2022-05-01 17:30:06
9 2022-05-01 09:58:15 2022-05-01 19:58:16
What I did so far:
df['Duration'] = df['End'] - df['Start']
df
output:
Start End Duration
0 2022-05-01 14:32:07 2022-05-01 15:39:51 0 days 01:07:44
1 2022-05-01 13:09:20 2022-05-02 13:10:56 1 days 00:01:36
2 2022-05-01 17:19:12 2022-05-01 21:29:04 0 days 04:09:52
3 2022-05-01 16:12:03 2022-05-01 23:56:27 0 days 07:44:24
4 2022-05-01 13:24:42 2022-05-03 13:28:05 2 days 00:03:23
5 2022-05-01 02:24:17 2022-05-01 16:24:18 0 days 14:00:01
6 2022-05-01 12:53:35 2022-05-01 12:55:42 0 days 00:02:07
7 2022-05-01 09:57:34 2022-05-01 19:57:35 0 days 10:00:01
8 2022-05-01 17:24:29 2022-05-01 17:30:06 0 days 00:05:37
9 2022-05-01 09:58:15 2022-05-01 19:58:16 0 days 10:00:01
I also tried to convert the Duration column to datetime (using: df['Duration'] = pd.to_datetime(df['Duration'])
then try to compare but I still get errors.
CodePudding user response:
Here is one way to do it:
df = df.apply(pd.to_datetime)
# As suggested in the comments
delta = (df["End"] - df["Start"]).dt.total_seconds() // 3600
# You do not specify what x, y, z are supposed to be, so this is an example
df["Delta"] = (
delta
.mask(delta > 24, "> 24h")
.mask((delta <= 24) & (delta > 8), "> 8h")
.mask((delta <= 8) & (delta > 1), "> 1h")
.mask((delta <= 1), "<= 1h")
)
print(df)
# Output
Start End Delta
0 2022-05-01 14:32:07 2022-05-01 15:39:51 <= 1h
1 2022-05-01 13:09:20 2022-05-02 13:10:56 > 8h
2 2022-05-01 17:19:12 2022-05-01 21:29:04 > 1h
3 2022-05-01 16:12:03 2022-05-01 23:56:27 > 1h
4 2022-05-01 13:24:42 2022-05-03 13:28:05 > 24h
5 2022-05-01 02:24:17 2022-05-01 16:24:18 > 8h
6 2022-05-01 12:53:35 2022-05-01 12:55:42 <= 1h
7 2022-05-01 09:57:34 2022-05-01 19:57:35 > 8h
8 2022-05-01 17:24:29 2022-05-01 17:30:06 <= 1h
9 2022-05-01 09:58:15 2022-05-01 19:58:16 > 8h