Home > other >  Pandas: Assign value in column based on time range of the respective row (no header)
Pandas: Assign value in column based on time range of the respective row (no header)

Time:09-02

I'm working on converting csv files with no header into dataframe. I'm using usecols to filter column selection and declare header=None.

import pandas as pd

path = r"data.csv"
data = pd.read_csv(path, usecols=[0, 1, 3, 4], header=None)   
df = pd.DataFrame(data)

Sample data :

{0: {0: '2022-08-06',
  1: '2022-08-06',
  2: '2022-08-06',
  3: '2022-08-06',
  4: '2022-08-06'},
 1: {0: '07:35:16',
  1: '07:35:22',
  2: '07:35:29',
  3: '07:35:36',
  4: '07:35:42'},
 3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
 4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}}

After that, I need to add new blank column for working shift data which depends on time data from column index [1].

Here the relationship between time and working shift:

19500hrs to 0749hrs = 'PM' shift
07500hrs to 1949hrs = 'AM' shift

The question is, how can I use range of time to assign data in working shift column? This is what I'm working on so far:

ShiftID = pd.Series([], dtype=pd.StringDtype())

df[1] = pd.to_datetime(df[1])

start = datetime.strptime('19:50:00', '%H:%M:%S').time()
end = datetime.strptime('07:49:59', '%H:%M:%S').time()

for i in range(len(df)):
    if df[1].dt.time.between(start, end): # <------- I'm confuse right here
        ShiftID[i]= 'PM'
    else:
        ShiftID[i]= 'AM'

df.insert(2, "ShiftID", ShiftID) # <--- insert 'working shift' column into index [2]

display(df)

My desired output is something like this:

0: {0: '2022-08-06',
  1: '2022-08-06',
  2: '2022-08-06',
  3: '2022-08-06',
  4: '2022-08-06'},
 1: {0: '07:35:16',
  1: '07:35:22',
  2: '07:35:29',
  3: '07:35:36',
  4: '07:35:42'},
 2: {0: 'PM', 1: 'PM', 2: 'PM', 3: 'PM', 4: 'PM'},
 3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
 4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}}

CodePudding user response:

Try this:

values = np.where(df[1].dt.time.between(start, end), 'PM', 'AM')
ShiftID = pd.Series(values, df.index)

or

ShiftID = pd.Series('PM', df.index)
ShiftID.where(df[1].dt.time.between(start, end), 'AM')

update

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({0: {0: '2022-08-06',
  1: '2022-08-06',
  2: '2022-08-06',
  3: '2022-08-06',
  4: '2022-08-06'},
 1: {0: '07:35:16',
  1: '07:35:22',
  2: '07:35:29',
  3: '07:35:36',
  4: '07:35:42'},
 3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
 4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}})

am_start = pd.Timedelta(hours=7, minutes=50)
am_stop = pd.Timedelta(hours=19, minutes=49)
s = pd.to_timedelta(df[1])
df[2] = np.where(s.between(am_start, am_stop), 'AM', 'PM')
df.sort_index(axis=1, inplace=True)

CodePudding user response:

you should give a look to pandas's cut method.

As said above, without the raw data, it's hard to propose code, but something like this should be close to your solution (replace values between <>):

df["Working Shift"] = df.cut(df[1], bins=[<00:00>, <begin>, <end>, <23:59>], labels=["AM", "PM", "AM"])

CodePudding user response:

this one also may work:

import numpy as np

condition1 = [(df[0].dt.time >= start) & (df[0].dt.time <= end)]
df['ShiftID'] = np.where(condition1, "PM","AM")
  • Related