Home > OS >  How to do a VLOOKUP function in column of lists on Pandas?
How to do a VLOOKUP function in column of lists on Pandas?

Time:06-02

I have two df.

data = [['123', ['135', '987']], ['456', ['246', '000', '111']], ['789', ['369']]]
df = pd.DataFrame(data, columns=['case_id', 'case_num']) #create first df

data1 = [[' ', '135'], [' ', '000'], [' ', '369']]
df_target = pd.DataFrame(data1, columns=['case_id', 'case_num']) #create target df

list_of_id = []
for number in df_target['case_num']: #run through the df target's case numbers
    temp = []
    for list_of_numbers in enumerate(df['case_num']): #run through df's case numbers
        if number in list_of_numbers:
            temp.append(df.iloc[j]['case_id']) #add the case IDs to the temp list
    list_of_id.append(temp) #append and save the relevant case IDs

It does what it is supposed to do, but the data frame is so big that it is taking a long time. I know sometimes I don't have to use two 'for' loop, so is there another way?

Edit:

Expected Output should be that df_target has the corresponding 'case_id'. The code I showed above only put the corresponding 'case_id' in a list. 

CodePudding user response:

IIUC the objective is to fill in the missing case_id, then this is another way to accomplish it

explode the list in your 'df', so there is 1:1 b/w case-id and case-num secondly, merge the two DFs

here is the code

df=df.explode('case_num')
df_target.merge(df, on='case_num', suffixes=('' , '_new'))
    case_id case_num    case_id_new
0                135        123
1                000        456
2                369        789

CodePudding user response:

import pandas as pd
data = [['123', ['135', '987']], ['456', ['246', '000', '111']], ['789', ['369']]]
df = pd.DataFrame(data, columns=['case_id', 'case_num']) #create first df
data1 = [[' ', '135'], [' ', '000'], [' ', '369']]
df_target = pd.DataFrame(data1, columns=['case_id', 'case_num']) #create target df
df_target = df_target[['case_num']].merge(df.explode('case_num').drop_duplicates(), how = 'left', on = 'case_num')
df_target
  • Related