Home > database >  How to increase speed on finding an index?
How to increase speed on finding an index?

Time:10-19

I have two dataframes: df2

Time(s) vehicle_id
10          1
11          1
12          1
13          1
...         ...
78          2
79          2
80          2

And com:

index    v1    v2
  1       1    2

I would like to find out the index of the row which matches the following condition:

if (abs(max(df2.loc[df2['vehicle_id'] == com['v1'][i], 'time(s)']) - max(df2.loc[df2['vehicle_id'] == com['v2'][i], 'time(s)']))>60.0):
    arr.append(i)

In this case, the condition is true since the max time from v1 is 13 and from v2 is 80, then, 13-80=-67=67 It works, however, it has to compare against more than 40M rows and it is taking about 200k rows/hour, so, it will take hours. Is there a faster way to proceed? I just need to obtain the indexes from com which matches the above condition.

Update: I check all the condition positions using a FOR loop

for i in range(0, len(com)):
if (abs(max(df2.loc[df2['vehicle_id'] == com['v1'][i], 'time(s)']) - max(df2.loc[df2['vehicle_id'] == com['v2'][i], 'time(s)']))>60.0):
        arr.append(i)

CodePudding user response:

It looks like you only need to consider the max time for each id, so why not sort then drop duplicates to keep only the max time per vehicle id. Then you can replace the vehicle ids in your comm df w/the times and perform your filter conditions on the entire result, saving the resulting indexes to an array.

import pandas as pd

df2 = pd.DataFrame({'Time(s)': [10, 11, 12, 13, 78, 79, 80], 'vehicle_id': [1, 1, 1, 1, 2, 2, 2]})
comm = pd.DataFrame({'index': [1], 'v1': [1], 'v2': [2]})

m = df.sort_values(by='Time(s)').drop_duplicates(subset='vehicle_id', keep='last')
comm = comm.set_index('index').replace(m.set_index('vehicle_id')['Time(s)'].to_dict())
arr = comm.loc[(comm['v1']-comm['v2']).abs().gt(60)].index.values

print(arr)
  • Related