Find the time difference between consecutive rows of two columns for a given value in third column-CodePudding

Lets say we want to compute the variable D in the dataframe below based on time values in variable B and C.

Here, second row of D is C2 - B1, the difference is 4 minutes and third row = C3 - B2= 4 minutes,.. and so on.

There is no reference value for first row of D so its NA.

Issue:

We also want a NA value for the first row when the category value in variable A changes from 1 to 2. In other words, the value -183 must be replaced by NA.

A   B         C         D
1   5:43:00   5:24:00   NA
1   6:19:00   5:47:00   4
1   6:53:00   6:23:00   4
1   7:29:00   6:55:00   2
1   8:03:00   7:31:00   2
1   8:43:00   8:05:00   2
2   6:07:00   5:40:00   -183
2   6:42:00   6:11:00   4
2   7:15:00   6:45:00   3
2   7:53:00   7:17:00   2
2   8:30:00   7:55:00   2
2   9:07:00   8:32:00   2
2   9:41:00   9:09:00   2
2   10:17:00  9:46:00   5
2   10:52:00  10:20:00  3

CodePudding user response：

You can use:

# Compute delta
df['D'] = (pd.to_timedelta(df['C']).sub(pd.to_timedelta(df['B'].shift()))
                                   .dt.total_seconds().div(60))

# Fill nan
df.loc[df['A'].ne(df['A'].shift()), 'D'] = np.nan

Output:

>>> df
    A         B         C    D
0   1   5:43:00   5:24:00  NaN
1   1   6:19:00   5:47:00  4.0
2   1   6:53:00   6:23:00  4.0
3   1   7:29:00   6:55:00  2.0
4   1   8:03:00   7:31:00  2.0
5   1   8:43:00   8:05:00  2.0
6   2   6:07:00   5:40:00  NaN
7   2   6:42:00   6:11:00  4.0
8   2   7:15:00   6:45:00  3.0
9   2   7:53:00   7:17:00  2.0
10  2   8:30:00   7:55:00  2.0
11  2   9:07:00   8:32:00  2.0
12  2   9:41:00   9:09:00  2.0
13  2  10:17:00   9:46:00  5.0
14  2  10:52:00  10:20:00  3.0

CodePudding user response：

You can use the difference between datetime columns in pandas.

Having

df['B_dt'] = pd.to_datetime(df['B'])
df['C_dt'] = pd.to_datetime(df['C'])

Makes the following possible

>>> df['D'] = (df.groupby('A')
                .apply(lambda s: (s['C_dt'] - s['B_dt'].shift()).dt.seconds / 60)
                .reset_index(drop=True))

You can always drop these new columns later.