Home > Back-end >  Pandas change time values based on condition
Pandas change time values based on condition

Time:06-08

I have a dataframe:

data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)

I would like to convert the time based on conditions: if the hour is less than 9, I want to set it to 9 and if the hour is more than 17, I need to set it to 17.

I tried this approach:

df['time'] = np.where(((df['time'].dt.hour < 9) & (df['time'].dt.hour != 0)), dt.time(9, 00))

I am getting an error: Can only use .dt. accesor with datetimelike values.

Can anyone please help me with this? Thanks.

CodePudding user response:

Here's a way to do what your question asks:

df.time = pd.to_datetime(df.time)
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64')   (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64')   (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')

Input:

                 time
0 2022-06-06 08:45:00
1 2022-06-06 09:30:00
2 2022-06-06 18:00:00
3 2022-06-06 15:00:00

Output:

                 time
0 2022-06-06 09:45:00
1 2022-06-06 09:30:00
2 2022-06-06 17:00:00
3 2022-06-06 15:00:00

UPDATE:

Here's alternative code to try to address OP's error as described in the comments:

import pandas as pd
import datetime

data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print('', 'df loaded as strings:', df, sep='\n')

df.time = pd.to_datetime(df.time, format='%H:%M:%S')
print('', 'df converted to datetime by pd.to_datetime():', df, sep='\n')
df.loc[df.time.dt.hour < 9, 'time'] = (df.time.astype('int64')   (9 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.loc[df.time.dt.hour > 17, 'time'] = (df.time.astype('int64')   (17 - df.time.dt.hour)*3600*1000000000).astype('datetime64[ns]')
df.time = [time.time() for time in pd.to_datetime(df.time)]
print('', 'df with time column adjusted to have hour between 9 and 17, converted to type "time":', df, sep='\n')

Output:


df loaded as strings:
       time
0  08:45:00
1  09:30:00
2  18:00:00
3  15:00:00

df converted to datetime by pd.to_datetime():
                 time
0 1900-01-01 08:45:00
1 1900-01-01 09:30:00
2 1900-01-01 18:00:00
3 1900-01-01 15:00:00

df with time column adjusted to have hour between 9 and 17, converted to type "time":
       time
0  09:45:00
1  09:30:00
2  17:00:00
3  15:00:00

CodePudding user response:

Since your 'time' column contains strings they can kept as strings but filter for your criteria it is convenient to: create datetime objects of that column, make comparisons to create a boolean series, use those series to make assignments.

Your data:

import numpy as np
import pandas as pd

data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00']}
df = pd.DataFrame(data)
print(df.to_string())

>>>
       time
0  08:45:00
1  09:30:00
2  18:00:00
3  15:00:00

Convert to datetime, make boolean Series with your criteria

dts = pd.to_datetime(df['time'])
lt_nine = dts.dt.hour < 9
gt_seventeen = (dts.dt.hour >= 17)
print(lt_nine)
print(gt_seventeen)

>>>
0     True
1    False
2    False
3    False
Name: time, dtype: bool
0    False
1    False
2     True
3    False
Name: time, dtype: bool

Use the boolean series to assign a new value:

df.loc[lt_nine,'time'] = '09:00:00'
df.loc[gt_seventeen,'time'] = '17:00:00'
print(df.to_string())

>>>
       time
0  09:00:00
1  09:30:00
2  17:00:00
3  15:00:00

Or just stick with strings altogether. Concoct regex patterns that will match your criteria; use them to make boolean Series; assign new values.

data = {'time':['08:45:00', '09:30:00', '18:00:00', '15:00:00','07:22:00','22:02:06']}
dg = pd.DataFrame(data)
print(dg.to_string())

>>>
       time
0  08:45:00
1  09:30:00
2  18:00:00
3  15:00:00
4  07:22:00
5  22:02:06

Construct the patterns.

pattern_lt_nine = '^'   '|'.join(f'{n:02}' for n in range(9))
pattern_gt_seventeen = '^'   '|'.join(str(n) for n in range(17,24))

>>> pattern_lt_nine
'^00|01|02|03|04|05|06|07|08'
>>> pattern_gt_seventeen
'^17|18|19|20|21|22|23'

Make boolean Series and assign new values

gt_seventeen = dg['time'].str.match(pattern_gt_seventeen)
lt_nine = dg['time'].str.match(pattern_lt_nine)
dg.loc[lt_nine,'time'] = '09:00:00'
dg.loc[gt_seventeen,'time'] = '17:00:00'
print(dg.to_string())

>>>
       time
0  09:00:00
1  09:30:00
2  17:00:00
3  15:00:00
4  09:00:00
5  17:00:00

Time series / date functionality
Working with text data

  • Related