I have a dataframe like this
df = pd.DataFrame({
'app': [1,2,3,4,5],
'start_time': ['2022-03-11 22:26:00', '2022-03-11 22:26:30', '2022-03-11 22:27:00', '2022-03-11 22:27:30', '2022-03-11 22:28:00'],
'finish_time': ['2022-03-11 22:26:40', '2022-03-11 22:27:00', '2022-03-11 22:28:00', '2022-03-11 22:27:40', '2022-03-11 22:29:00']
})
df['start_time']=pd.to_datetime(df['start_time'])
df['finish_time']=pd.to_datetime(df['finish_time'])
My main purpose is to create a plot x-axis is time, y-axis is the count of running app
By that way, my idea is to create new column which is equal the running app when the app is started. For example in this case when the app 2 start actually app 1 is still running (it is fine if app 2 is included in the counting process), but I am stuck here (this is the example of the dataframe that I intended to make)
app start_time finish_time running_apps(if current app included)
0 1 2022-03-11 22:26:00 2022-03-11 22:26:40 1
1 2 2022-03-11 22:26:30 2022-03-11 22:27:00 2
2 3 2022-03-11 22:27:00 2022-03-11 22:28:00 2
3 4 2022-03-11 22:27:30 2022-03-11 22:27:40 2
4 5 2022-03-11 22:28:00 2022-03-11 22:29:00 2
if someone else has another idea, it would be appreciated, thank you
CodePudding user response:
You can use numpy broadcasting with np.tril
for lower triangle for test next datetimes, chain bot hmask and count True
s by sum
:
df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])
a = np.tril(df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = np.tril(df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])
df['count'] = (a & b).sum(axis=1)
print (df)
app start_time finish_time count
0 1 2022-03-11 22:26:00 2022-03-11 22:26:40 1
1 2 2022-03-11 22:26:30 2022-03-11 22:27:00 2
2 3 2022-03-11 22:27:00 2022-03-11 22:28:00 1
3 4 2022-03-11 22:27:30 2022-03-11 22:27:40 2
4 5 2022-03-11 22:28:00 2022-03-11 22:29:00 1
Or if need compare between all values:
df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])
a = (df['finish_time'].to_numpy() > df['start_time'].to_numpy()[:,None])
b = (df['start_time'].to_numpy() < df['finish_time'].to_numpy()[:,None])
df['count'] = (a & b).sum(axis=1)
print (df)
app start_time finish_time count
0 1 2022-03-11 22:26:00 2022-03-11 22:26:40 2
1 2 2022-03-11 22:26:30 2022-03-11 22:27:00 2
2 3 2022-03-11 22:27:00 2022-03-11 22:28:00 2
3 4 2022-03-11 22:27:30 2022-03-11 22:27:40 2
4 5 2022-03-11 22:28:00 2022-03-11 22:29:00 1