I'm working on converting csv files with no header into dataframe. I'm using usecols
to filter column selection and declare header=None
.
import pandas as pd
path = r"data.csv"
data = pd.read_csv(path, usecols=[0, 1, 3, 4], header=None)
df = pd.DataFrame(data)
Sample data :
{0: {0: '2022-08-06',
1: '2022-08-06',
2: '2022-08-06',
3: '2022-08-06',
4: '2022-08-06'},
1: {0: '07:35:16',
1: '07:35:22',
2: '07:35:29',
3: '07:35:36',
4: '07:35:42'},
3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}}
After that, I need to add new blank column for working shift
data which depends on time
data from column index [1]
.
Here the relationship between time
and working shift
:
19500hrs to 0749hrs = 'PM' shift
07500hrs to 1949hrs = 'AM' shift
The question is, how can I use range of time to assign data in working shift
column? This is what I'm working on so far:
ShiftID = pd.Series([], dtype=pd.StringDtype())
df[1] = pd.to_datetime(df[1])
start = datetime.strptime('19:50:00', '%H:%M:%S').time()
end = datetime.strptime('07:49:59', '%H:%M:%S').time()
for i in range(len(df)):
if df[1].dt.time.between(start, end): # <------- I'm confuse right here
ShiftID[i]= 'PM'
else:
ShiftID[i]= 'AM'
df.insert(2, "ShiftID", ShiftID) # <--- insert 'working shift' column into index [2]
display(df)
My desired output is something like this:
0: {0: '2022-08-06',
1: '2022-08-06',
2: '2022-08-06',
3: '2022-08-06',
4: '2022-08-06'},
1: {0: '07:35:16',
1: '07:35:22',
2: '07:35:29',
3: '07:35:36',
4: '07:35:42'},
2: {0: 'PM', 1: 'PM', 2: 'PM', 3: 'PM', 4: 'PM'},
3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}}
CodePudding user response:
Try this:
values = np.where(df[1].dt.time.between(start, end), 'PM', 'AM')
ShiftID = pd.Series(values, df.index)
or
ShiftID = pd.Series('PM', df.index)
ShiftID.where(df[1].dt.time.between(start, end), 'AM')
update
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict({0: {0: '2022-08-06',
1: '2022-08-06',
2: '2022-08-06',
3: '2022-08-06',
4: '2022-08-06'},
1: {0: '07:35:16',
1: '07:35:22',
2: '07:35:29',
3: '07:35:36',
4: '07:35:42'},
3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}})
am_start = pd.Timedelta(hours=7, minutes=50)
am_stop = pd.Timedelta(hours=19, minutes=49)
s = pd.to_timedelta(df[1])
df[2] = np.where(s.between(am_start, am_stop), 'AM', 'PM')
df.sort_index(axis=1, inplace=True)
CodePudding user response:
you should give a look to pandas's cut method.
As said above, without the raw data, it's hard to propose code, but something like this should be close to your solution (replace values between <>):
df["Working Shift"] = df.cut(df[1], bins=[<00:00>, <begin>, <end>, <23:59>], labels=["AM", "PM", "AM"])
CodePudding user response:
this one also may work:
import numpy as np
condition1 = [(df[0].dt.time >= start) & (df[0].dt.time <= end)]
df['ShiftID'] = np.where(condition1, "PM","AM")