I'm using pandas, I have one column attendance timestamp data (Date). I want to split the same day values into (InTime) and (OutTime) columns.
df = df[['Date']]
Date
Thu 1/09 9:10 AM
Thu 1/09 6:10 PM
Fri 2/09 9:04 AM
Fri 2/09 6:02 PM
I'm trying to achieve the following result.
In Time OutTime
Thu 1/09 9:10 AM Thu 1/09 6:10 PM
Fri 2/09 9:04 AM Fri 2/09 6:02 PM
Thanks.
CodePudding user response:
Here is a solution with pandas.DataFrame.join
and pandas.DataFrame.shift
:
new_df = (
df.add_suffix('_In_time')
.join(df.shift(-1).add_suffix('_Out_time'))
.iloc[::2]
)
# Output :
print(new_df)
Date_In_time Date_Out_time
0 Thu 1/09 9:10 AM Thu 1/09 6:10 PM
1 Fri 2/09 9:04 AM Fri 2/09 6:02 AM
CodePudding user response:
another approach, convert the column to datetime and work with it. try:
df
Date
0 Thu 1/09 9:10 AM
1 Thu 1/09 6:10 PM
2 Fri 2/09 9:04 AM
3 Fri 2/09 6:02 PM
df['Date'] = pd.to_datetime(df['Date'] ' 2022', format='%a %d/%m %H:%M %p %Y')
df['day'] = df['Date'].dt.date
df['time'] = df['Date'].dt.time
df.groupby('day').agg(InTime=('time', 'first'), OutTime=('time', 'last')).reset_index()
day InTime OutTime
0 2022-09-01 09:10:00 06:10:00
1 2022-09-02 09:04:00 06:02:00
CodePudding user response:
A possible solution, based on pandas.DataFrame.pivot
:
# separate day from time
df[['Date1', 'Date2']] = df['Date'].str.split('(?<=\d)\s(?=\d)', expand=True)
# create column with colnames for the new columns to be created by pivot
df['names'] = ['inTime', 'OutTime'] * (len(df)//2)
(df.pivot(index='Date1', columns='names', values='Date')
.reset_index(drop=True).iloc[:,::-1])
Output:
names inTime OutTime
0 Fri 2/09 9:04 AM Fri 2/09 6:02 PM
1 Thu 1/09 9:10 AM Thu 1/09 6:10 PM
In case the dataset has not times ordered from earlier to later, then a solution might be the following:
# separate day from time
df[['Date1', 'Date2', 'Date3']] = df['Date'].str.split(
'(?<=\d)\s(?=\d)|\s(?=.M$)', expand=True)
# this is needed if the times are no sorted in the initial dataset
df.sort_values(['Date1', 'Date3', 'Date2'])
# create column with colnames for the new columns to be created by pivot
df['names'] = ['inTime', 'OutTime'] * (len(df)//2)
(df.pivot(index='Date1', columns='names', values='Date')
.reset_index(drop=True).iloc[:,::-1])