Counting the running app from start and finish datetime column in Dataframe-CodePudding

I have a dataframe like this

df = pd.DataFrame({
'app': [1,2,3,4,5],
'start_time': ['2022-03-11 22:26:00', '2022-03-11 22:26:30', '2022-03-11 22:27:00', '2022-03-11 22:27:30', '2022-03-11 22:28:00'],
'finish_time': ['2022-03-11 22:26:40', '2022-03-11 22:27:00', '2022-03-11 22:28:00', '2022-03-11 22:27:40', '2022-03-11 22:29:00']
})

df['start_time']=pd.to_datetime(df['start_time'])
df['finish_time']=pd.to_datetime(df['finish_time'])

My main purpose is to create a plot x-axis is time, y-axis is the count of running app

By that way, my idea is to create new column which is equal the running app when the app is started. For example in this case when the app 2 start actually app 1 is still running (it is fine if app 2 is included in the counting process), but I am stuck here (this is the example of the dataframe that I intended to make)

    app start_time  finish_time  running_apps(if current app included)
0   1   2022-03-11 22:26:00 2022-03-11 22:26:40 1
1   2   2022-03-11 22:26:30 2022-03-11 22:27:00 2
2   3   2022-03-11 22:27:00 2022-03-11 22:28:00 2
3   4   2022-03-11 22:27:30 2022-03-11 22:27:40 2
4   5   2022-03-11 22:28:00 2022-03-11 22:29:00 2

if someone else has another idea, it would be appreciated, thank you

CodePudding user response：

You can use numpy broadcasting with np.tril for lower triangle for test next datetimes, chain bot hmask and count Trues by sum:

df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])

a = np.tril(df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = np.tril(df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])

df['count'] = (a & b).sum(axis=1)
print (df)
   app          start_time         finish_time  count
0    1 2022-03-11 22:26:00 2022-03-11 22:26:40      1
1    2 2022-03-11 22:26:30 2022-03-11 22:27:00      2
2    3 2022-03-11 22:27:00 2022-03-11 22:28:00      1
3    4 2022-03-11 22:27:30 2022-03-11 22:27:40      2
4    5 2022-03-11 22:28:00 2022-03-11 22:29:00      1

Or if need compare between all values:

df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])

a = (df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = (df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])

df['count'] = (a & b).sum(axis=1)
print (df)
   app          start_time         finish_time  count
0    1 2022-03-11 22:26:00 2022-03-11 22:26:40      2
1    2 2022-03-11 22:26:30 2022-03-11 22:27:00      2
2    3 2022-03-11 22:27:00 2022-03-11 22:28:00      2
3    4 2022-03-11 22:27:30 2022-03-11 22:27:40      2
4    5 2022-03-11 22:28:00 2022-03-11 22:29:00      1