Calculate time difference as decimal in a difference matrix-CodePudding

I'm analyzing timecard data and comparing employee's clockin/out times to each other. I'm exploring the data using a difference matrix in a DataFrame. How do I convert the day, hour timedelta to decimal, or even just a sensible /- without the -1 days 23:40:00 weirdness?

employees = [('GILL', datetime(2022,12,1,6,40,0), datetime(2022,12,1,14,30,0)),
             ('BOB', datetime(2022,12,1,6,0,0), datetime(2022,12,1,14,10,0)),
            ('TOBY', datetime(2022,12,1,14,0,0), datetime(2022,12,1,22,30,0))]
labels = ['name', 'clockin', 'clockout']
df = pd.DataFrame.from_records(employees, columns=labels)

and my difference matrix is constructed with these two lines:

arr = (df2['clockin'].values - df2['clockin'].values[:, None])
pd.concat((df2['name'], pd.DataFrame(arr, columns=df2['name'])), axis=1)

	name	GILL	BOB	TOBY
0	GILL	0 days 00:00:00	-1 days 23:20:00	0 days 07:20:00
1	BOB	0 days 00:40:00	0 days 00:00:00	0 days 08:00:00
2	TOBY	-1 days 16:40:00	-1 days 16:00:00	0 days 00:00:00

The trick to get a decimal difference is to use Pandas Datetime assessor's total_seconds() function. But, this has no place in the arr array expression.

Here is total_seconds() doing it's magic:

df1['workhours'] = round((df1['clockout'] - df1['clockin']).dt.total_seconds() / 60.0 / 60.0, 2)

I tried an apply on the time columns, but I can't get it to work. This might be the easy answer.

df_in.apply(lambda x: (x.total_seconds() / 60.0 / 60.0), columns=['BOB', 'GILL', 'TOBY'])

CodePudding user response：

Example

we need reproducible and minimal example for answer. lets make

data = [['GILL', pd.Timedelta('0 days'), pd.Timedelta('-1 days  23:20:00')], 
        ['BoB', pd.Timedelta('0 days 00:40:00'), pd.Timedelta('0 days 00:00:00')]]
df = pd.DataFrame(data, columns=['name', 'GILL', 'BOB'])

df

    name    GILL                BOB
0   GILL    0 days 00:00:00     -1 days  23:20:00
1   BoB     0 days 00:40:00     0 days 00:00:00

Code

df[['GILL', 'BOB']].apply(lambda x: x.dt.total_seconds())

result

    GILL    BOB
0   0.0 -2400.0
1   2400.0  0.0

make result to columns

df[['GILL', 'BOB']] = df[['GILL', 'BOB']].apply(lambda x: x.dt.total_seconds())

df

    name    GILL        BOB
0   GILL    0.0         -2400.0
1   BoB     2400.0      0.0

CodePudding user response：

You were almost there.

Your following expression computes the numpy timedelta:

type((df['clockin'].values - df['clockin'].values[:, None])[0][0])

[Out]:
numpy.timedelta64

Just divide it by np.timedelta64(1, 'h') or np.timedelta64(1, 'm').

Time difference in hours

Divide by np.timedelta64(1, 'h'):

arr = (df['clockin'].values - df['clockin'].values[:, None]) / np.timedelta64(1, 'h')
pd.concat((df['name'], pd.DataFrame(arr, columns=df['name'])), axis=1)

[Out]:
   name      GILL       BOB      TOBY
0  GILL  0.000000 -0.666667  7.333333
1   BOB  0.666667  0.000000  8.000000
2  TOBY -7.333333 -8.000000  0.000000

Time difference in minutes

Divide by np.timedelta64(1, 'm'):

arr = (df['clockin'].values - df['clockin'].values[:, None]) / np.timedelta64(1, 'm')
pd.concat((df['name'], pd.DataFrame(arr, columns=df['name'])), axis=1)

[Out]:
   name   GILL    BOB   TOBY
0  GILL    0.0  -40.0  440.0
1   BOB   40.0    0.0  480.0
2  TOBY -440.0 -480.0    0.0