Home > other >  Subtract time only from two datetime columns in Pandas
Subtract time only from two datetime columns in Pandas

Time:07-08

I am looking to do something like in this thread. However, I only want to subtract the time component of the two datetime columns.

For eg., given this dataframe:

                  ts1        ts2
0 2018-07-25 11:14:00        2018-07-27 12:14:00
1 2018-08-26 11:15:00        2018-09-24 10:15:00
2 2018-07-29 11:17:00        2018-07-22 11:00:00

The expected output for ts2 -ts1 time component only should give:

                  ts1                        ts2                ts_delta
0 2018-07-25 11:14:00        2018-07-27 12:14:00                1:00:00
1 2018-08-26 11:15:00        2018-09-24 10:15:00               -1:00:00
2 2018-07-29 11:17:00        2018-07-22 11:00:00               -0:17:00

So, for row 0: the time for ts2 is 12:14:00, the time for ts1 is 11:14:00. The expected output is just these two times subtracting (don't care about the days). In this case:

12:14:00 - 11:14:00 = 1:00:00.

How would I do this in one single line?

CodePudding user response:

I've tried to simulate your problem in my local environment. Apparently pandas.datetime64' types supporting add/subtract operations. You don't actually need to access datetime` object to execute these operations.

I did my experiments as below;

import pandas as pd

df = pd.DataFrame({'a' : ['2018-07-25 11:14:00', '2018-08-26 11:15:00', '2018-07-29 11:17:00'],
                   'b' : ['2018-07-27 12:14:00', '2018-09-24 10:15:00', '2018-07-22 11:00:00'] })

df['a'] = pd.to_datetime(df['a'])
df['b'] = pd.to_datetime(df['b'])

df['d'] = df['b'] - df['a']

and df is like;

    a                   b                   d
0   2018-07-25 11:14:00 2018-07-27 12:14:00 2 days 01:00:00
1   2018-08-26 11:15:00 2018-09-24 10:15:00 28 days 23:00:00
2   2018-07-29 11:17:00 2018-07-22 11:00:00 -8 days  23:43:00

CodePudding user response:

Try this, first strip time, then put time on same day, subtract, and take absolute value.

l = lambda x: pd.to_datetime("01-01-1900 "   x)

df["ts_delta"] = (
    df["ts2"].dt.time.astype(str).apply(l) - df["ts1"].dt.time.astype(str).apply(l)
).abs()
df

Output:

                  ts1                 ts2        ts_delta
0 2018-07-25 11:14:00 2018-07-27 12:14:00 0 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 0 days 01:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 0 days 00:17:00

CodePudding user response:

You need to set both datetimes to a common date first.

One way is to use pandas.DateOffset:

o = pd.DateOffset(day=1, month=1, year=2022) # the exact numbers don't matter
# reset dates
ts1 = df['ts1'].add(o)
ts2 = df['ts2'].add(o)
# subtract
df['ts_delta'] = ts2.sub(ts1)

As one-liner:

df['ts_delta'] = df['ts2'].add((o:=pd.DateOffset(day=1, month=1, year=2022))).sub(df['ts1'].add(o))

Other way using a difference between ts2-ts1 (with dates) and ts2-ts1 (dates only):

df['ts_delta'] = (df['ts2'].sub(df['ts1'])
                 -df['ts2'].dt.normalize().sub(df['ts1'].dt.normalize())
                 )

output:

                  ts1                 ts2          ts_delta
0 2018-07-25 11:14:00 2018-07-27 12:14:00   0 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 -1 days  23:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -1 days  23:43:00

NB. don't get confused by the -1 days 23:00:00, this is actually the ways to represent -1hour

CodePudding user response:

Another option, without the need for a reference date. Subtract ts1's time component from ts2 as a timedelta, then convert the resulting datetime to a timedelta by subtracting ts2' date:

df["delta_time"] = (df["ts2"] - pd.to_timedelta(df["ts1"].dt.time.astype(str))) - df["ts2"].dt.floor("d")

df
                  ts1                 ts2        delta_time
0 2018-07-25 11:14:00 2018-07-27 12:14:00   0 days 01:00:00
1 2018-08-26 11:15:00 2018-09-24 10:15:00 -1 days  23:00:00
2 2018-07-29 11:17:00 2018-07-22 11:00:00 -1 days  23:43:00

CodePudding user response:

Use df['col_name'].dt.time to get time from date time column. Let's assume your dataframe name is df. Now, in your case

t1 = df['ts1'].dt.time
t2 = df['ts2'].dt.time

df['ts_delta'] = t2 - t1

For single line

df['ts_delta'] = df['ts2'].dt.time - df['ts1'].dt.time

I hope it will resolve your issue. Happy Coding!

  • Related