I have two df.
data = [['123', ['135', '987']], ['456', ['246', '000', '111']], ['789', ['369']]]
df = pd.DataFrame(data, columns=['case_id', 'case_num']) #create first df
data1 = [[' ', '135'], [' ', '000'], [' ', '369']]
df_target = pd.DataFrame(data1, columns=['case_id', 'case_num']) #create target df
list_of_id = []
for number in df_target['case_num']: #run through the df target's case numbers
temp = []
for list_of_numbers in enumerate(df['case_num']): #run through df's case numbers
if number in list_of_numbers:
temp.append(df.iloc[j]['case_id']) #add the case IDs to the temp list
list_of_id.append(temp) #append and save the relevant case IDs
It does what it is supposed to do, but the data frame is so big that it is taking a long time. I know sometimes I don't have to use two 'for' loop, so is there another way?
Edit:
Expected Output should be that df_target has the corresponding 'case_id'. The code I showed above only put the corresponding 'case_id' in a list.
CodePudding user response:
IIUC the objective is to fill in the missing case_id, then this is another way to accomplish it
explode the list in your 'df', so there is 1:1 b/w case-id and case-num secondly, merge the two DFs
here is the code
df=df.explode('case_num')
df_target.merge(df, on='case_num', suffixes=('' , '_new'))
case_id case_num case_id_new
0 135 123
1 000 456
2 369 789
CodePudding user response:
import pandas as pd
data = [['123', ['135', '987']], ['456', ['246', '000', '111']], ['789', ['369']]]
df = pd.DataFrame(data, columns=['case_id', 'case_num']) #create first df
data1 = [[' ', '135'], [' ', '000'], [' ', '369']]
df_target = pd.DataFrame(data1, columns=['case_id', 'case_num']) #create target df
df_target = df_target[['case_num']].merge(df.explode('case_num').drop_duplicates(), how = 'left', on = 'case_num')
df_target