Home > OS >  Combining dataframes based on condition
Combining dataframes based on condition

Time:12-09

I am trying to merge information from two dataframes with different sizes. See a short cut from the dataframe in the code below.

import pandas as pd
df1 = pd.DataFrame({'ID': [0,1,2,3,4,5,6], 
               'distance': [55.1, 55.2, 55.3, 55.4, 55.6, 55.7, 55.8 ],
               'spec_location': ['','','','','','','',]})

df2 = pd.DataFrame({'ID': [0,1,2], 
               'start_distance': [55, 55.4, 55.8],
               'end_distance': [55.4,55.8,56],
               'location': ['PKS', 'DDS', 'LMS']})

Dataframe 1 is a more detailed dataframe and I need to transfer information from dataframe 2 to dataframe 1 based on a condition. The condition states that if distance (from df1) is between the start and end distance (from df2). The location specified in df2 should be filled in at the spec_location in df1. This would then result in a dataframe that looks likes this:

   ID  distance spec_location
0   0      55.1           PKS
1   1      55.2           PKS
2   2      55.3           PKS
3   3      55.4           PKS
4   4      55.6           DDS
5   5      55.7           DDS
6   6      55.8           DDS

I have tried various loops and merge techniques but I can't seem to get it to work.

CodePudding user response:

Write a mapper function and apply it:

def location(x):
    for _, row in df2.iterrows():
        if row['start_distance'] < x <= row['end_distance']:
            return row['location']

df1['spec_location'] = df1['distance'].apply(location)

Output:

   ID  distance spec_location
0   0      55.1           PKS
1   1      55.2           PKS
2   2      55.3           PKS
3   3      55.4           PKS
4   4      55.6           DDS
5   5      55.7           DDS
6   6      55.8           DDS
  • Related