Home > database >  Is there a pandas/numpy method to remove this nested for loop?
Is there a pandas/numpy method to remove this nested for loop?

Time:01-04

I am trying to find a solution that utilises the pandas and/or numpy libaries to do the below. I am merging where the track is equal across dataframes, and the location in the merged_df falls between the start and end values of df3.

I'm sure there is a way using the pandas merge function, but I can't work out how to do it between a range.

df1_length = len(df1.axes[0])
df2_length = len(df2.axes[0])

for j in range(df1_length):
    for k in range(df2_length):
        if (df1.at[j, 'Track'] == df2.at[k, 'Track'] and
           df1.at[j, 'Location'] >= df2.at[k, 'Start'] and
           df1.at[j, 'Location'] <= df2.at[k, 'End']):
            df1.at[j, 'Label'] = df2.at[k, 'Label']
            if df2.at[k, 'Label'] == 'Curve':
                df1.at[j, 'Superelevation'] = df2.at[k, 'Superelevation']
                df1.at[j, 'Curve Radius'] = df2.at[k, 'Curve Radius']
            break

df1:

Track Location
Up 1234
Up 2354
Up 4521
Up 8654
Up 9876

df2:

Track Start End Label Superelevation Curve Radius Direction
Up 0 2000 Curve 60 3200 R
Up 3000 4600 Transition
Up 9500 10000 Curve 35 900 L
Down 0 9999 Curve 20 1700 L

output:

Track Location Label Superelevation Curve Radius Direction
Up 1234 Curve 60 3200 R
Up 2354 NaN NaN NaN NaN
Up 4521 Transition NaN NaN NaN
Up 8654 NaN NaN NaN NaN
Up 9876 Curve 35 900 L

CodePudding user response:

You can use merge() to inner join merged_df and df3 on the Track columns and then do filtering.

merged_df = merged_df.merge(df3, on='Track', how='inner')
merged_df = merged.loc[(merged['Location'] >= merged['Start']) & (merged['Location'] <= merged['End'])].reset_index(drop=True).drop(columns=['Start', 'End'])

output:

>       Track  Location  Label  Superelevation  Curve Radius Direction
>     0    Up      1234  Curve              60          3200         R
>     1    Up      9876   Tang

CodePudding user response:

df_merged = df1.merge(df2, on='Track')

df_merged['Label'] = np.where((df_merged['Location'] >= df_merged['Start']) & (df_merged['Location'] <= df_merged['End']), df_merged['Label'], np.nan)
df_merged['Superelevation'] = np.where((df_merged['Location'] >= df_merged['Start']) & (df_merged['Location'] <= df_merged['End']), df_merged['Superelevation'], np.nan)
df_merged['Curve Radius'] = np.where((df_merged['Location'] >= df_merged['Start']) & (df_merged['Location'] <= df_merged['End']), df_merged['Curve Radius'], np.nan)
df_merged['Direction'] = np.where((df_merged['Location'] >= df_merged['Start']) & (df_merged['Location'] <= df_merged['End']), df_merged['Direction'], np.nan)

df_filtered = df_merged.dropna(subset='Label')

df_final = pd.merge(df_filtered, df1, on=['Location', 'Track'], how='right')
df_final.drop(labels=['Start', 'End'], axis=1, inplace=True)
  • Related