I'm analyzing timecard data and comparing employee's clockin/out times to each other. I'm exploring the data using a difference matrix in a DataFrame. How do I convert the day, hour timedelta to decimal, or even just a sensible /- without the -1 days 23:40:00
weirdness?
employees = [('GILL', datetime(2022,12,1,6,40,0), datetime(2022,12,1,14,30,0)),
('BOB', datetime(2022,12,1,6,0,0), datetime(2022,12,1,14,10,0)),
('TOBY', datetime(2022,12,1,14,0,0), datetime(2022,12,1,22,30,0))]
labels = ['name', 'clockin', 'clockout']
df = pd.DataFrame.from_records(employees, columns=labels)
and my difference matrix is constructed with these two lines:
arr = (df2['clockin'].values - df2['clockin'].values[:, None])
pd.concat((df2['name'], pd.DataFrame(arr, columns=df2['name'])), axis=1)
name | GILL | BOB | TOBY | |
---|---|---|---|---|
0 | GILL | 0 days 00:00:00 | -1 days 23:20:00 | 0 days 07:20:00 |
1 | BOB | 0 days 00:40:00 | 0 days 00:00:00 | 0 days 08:00:00 |
2 | TOBY | -1 days 16:40:00 | -1 days 16:00:00 | 0 days 00:00:00 |
The trick to get a decimal difference is to use Pandas Datetime assessor's total_seconds()
function. But, this has no place in the arr
array expression.
Here is total_seconds()
doing it's magic:
df1['workhours'] = round((df1['clockout'] - df1['clockin']).dt.total_seconds() / 60.0 / 60.0, 2)
I tried an apply on the time columns, but I can't get it to work. This might be the easy answer.
df_in.apply(lambda x: (x.total_seconds() / 60.0 / 60.0), columns=['BOB', 'GILL', 'TOBY'])
CodePudding user response:
Example
we need reproducible and minimal example for answer. lets make
data = [['GILL', pd.Timedelta('0 days'), pd.Timedelta('-1 days 23:20:00')],
['BoB', pd.Timedelta('0 days 00:40:00'), pd.Timedelta('0 days 00:00:00')]]
df = pd.DataFrame(data, columns=['name', 'GILL', 'BOB'])
df
name GILL BOB
0 GILL 0 days 00:00:00 -1 days 23:20:00
1 BoB 0 days 00:40:00 0 days 00:00:00
Code
df[['GILL', 'BOB']].apply(lambda x: x.dt.total_seconds())
result
GILL BOB
0 0.0 -2400.0
1 2400.0 0.0
make result to columns
df[['GILL', 'BOB']] = df[['GILL', 'BOB']].apply(lambda x: x.dt.total_seconds())
df
name GILL BOB
0 GILL 0.0 -2400.0
1 BoB 2400.0 0.0
CodePudding user response:
You were almost there.
Your following expression computes the numpy timedelta:
type((df['clockin'].values - df['clockin'].values[:, None])[0][0])
[Out]:
numpy.timedelta64
Just divide it by np.timedelta64(1, 'h')
or np.timedelta64(1, 'm')
.
Time difference in hours
Divide by np.timedelta64(1, 'h')
:
arr = (df['clockin'].values - df['clockin'].values[:, None]) / np.timedelta64(1, 'h')
pd.concat((df['name'], pd.DataFrame(arr, columns=df['name'])), axis=1)
[Out]:
name GILL BOB TOBY
0 GILL 0.000000 -0.666667 7.333333
1 BOB 0.666667 0.000000 8.000000
2 TOBY -7.333333 -8.000000 0.000000
Time difference in minutes
Divide by np.timedelta64(1, 'm')
:
arr = (df['clockin'].values - df['clockin'].values[:, None]) / np.timedelta64(1, 'm')
pd.concat((df['name'], pd.DataFrame(arr, columns=df['name'])), axis=1)
[Out]:
name GILL BOB TOBY
0 GILL 0.0 -40.0 440.0
1 BOB 40.0 0.0 480.0
2 TOBY -440.0 -480.0 0.0