Home > Mobile >  conditional check and join for multiple values row in pandas dataframe
conditional check and join for multiple values row in pandas dataframe

Time:04-08

I have two dataframes say df1 and df2. df1 has column info which has matching values from df2 column code_dig.

what i want is to check whether the info column from df1 has a matching value in code_dig column of df2, if there is matching value then it should bring corresponding code_loc value into a new column. If there were no multiple values within same row separated by semicolons, it was pretty straight forward

Here is bit subset of the data

import pandas as pd
    df1 = pd.DataFrame({'id': [1, 2, 3, 4, 5], 
                       'info': ['301', '521;519', '412;905', '409;206', '301']})

    df2 = pd.DataFrame({'code_dig': [206, 301, 409, 412, 519, 521, 905], 
                       'code_loc': [202, 302, 406, 404.3, 'inf', 'inf', 'inf' ]})

Desired output

   id     info added_from_df2
0   1      301            302
1   2  521;519        inf;inf
2   3  412;905      404.3;inf
3   4  409;206        406;202
4   5      301            302

What i tried so far: Since i was not able to find solution for above issue i tried a workaround, thought of splitting it all and adding the corresponding codes and then zipping it again which works but the issue is that order of the codes matter. lets say 521;519 != 519;521. and when zipping back it messes up the original order.

# split all ; seperated codes to assign localised codes
df1['info'] = df1['info'].str.split(';')
df1 = df1.explode('info',ignore_index=True)                           # count: 2227
df_merged = df1.merge(df2, left_on = 'info', right_on = 'code_dig')
grouped = df_merged.groupby('id').agg({'code_dig':'; '.join, 'code_loc':'; '.join,}).reset_index()
# adding localised codes and re-group the splitted signs
digitised_lv = df_merged.merge(grouped, on = 'id', how='inner')
digitised_lv.drop_duplicates(subset = 'id', inplace = True)
# to file
digitised_lv.to_file('result.shp')

some unfinished attempts includes making the df2 as dictionary and trying isin but it doesnt work, or trying to figure out if two loops can work but since no programming background i am stuck.

df1['info']=df1['info'].apply(lambda row: row.split(';'))
new_dict = dict(df2.values)
lst = [new_dict.keys]
df1['info'].isin(lst)

Trying two loops:

df1['added_from_df2'] = 0
for i, row in df1.iterrows():
    for i_,row_ in df2.iterrows():
        if row['info'] == row_['code_dig']:
            ? HOW to assign :( ?

CodePudding user response:

Here is one option with a complex transform:

df1.join(
    df1['info']
    .str.split(';')
    .explode()
    .astype(int)
    .map(df2.set_index('code_dig')['code_loc'])
    .astype(str)
    .groupby(level=0).agg(';'.join)
    .rename('added_from_df2')
)

Or a regex alternative:

dic = df2.astype(str).set_index('code_dig')['code_loc'].to_dict()
# {'206': '202', '301': '302', '409': '406', '412': '404.3',
#  '519': 'inf', '521': 'inf', '905': 'inf'}

df1['added_from_df2'] = (df1['info']
                         .str.replace('\d ', lambda m: dic.get(m.group(), 'nan'),
                                      regex=True)
                         )

output:

   id     info added_from_df2
0   1      301            302
1   2  521;519        inf;inf
2   3  412;905      404.3;inf
3   4  409;206        406;202
4   5      301            302
  • Related