Home > Software design >  How can I check if a combination of 2 columns appears in lists in a column of another dataframe?
How can I check if a combination of 2 columns appears in lists in a column of another dataframe?

Time:11-10

I have two large dataframes something like this:

df1:
             A  time
0    [a, b, c]   122
1    [a, d, e]    45
2    [b, c, e]    64

df2:
  Origin  Destination 
0      a            b   
1      b            c   
2      b            e   
3      d            e   

Now I want to compare the two, so that the code checks if the combination of Origin and Destination from df2 appears in the lists in column A of df1. If this combination appears I want this combination added in new columns in df1. If this combination does not appear in a list nothing needs to happen.

Also if there is multiple combinations for 1 list I want to add another row so that there will be one row per combination. If I would compare df1 with df2, the new df1 needs to look something like this:

df1:
             A  time  Origin  Destination
0    [a, b, c]   122       a            b
0    [a, b, c]   122       b            c
1    [a, d, e]    45       d            e
2    [b, c, e]    64       b            c
2    [b, c, e]    64       b            e

I have tried it with the code below:

for index, row in df1.iterrows():
    row['Location'] = str(row['Location'])

    for index1, row1 in df2.iterrows():
        row1['Origin'] = str(row1['Origin'])
        
        for index2, row2 in df2.iterrows():
            row2['Destination'] = str(row1['Destination'])

            if row1['Origin'] and row2['Destination'] in row['Location']:
                df1['Origin'] = df2['Origin']
                df1['Destination'] = df2['Destination']

But this gives the following output:

    Location    Time    Origin  Destination
0   [a, b, c]   122     a       b
1   [a, d, e]   45      b       c
2   [b, c, e]   64      b       e

For some reason I dont understand this does not work because it gives the combination b and c in the second line while this is not there.

Also if there is multiple combinations in a single list this does not add new rows yet.

CodePudding user response:

ar = []
for i, row in df1.iterrows():
    temp = row['A']
    df_temp = (df2.loc[(df2['Origin'].isin(temp)) & (df2['Destination'].isin(temp))]).copy()
    if df_temp.shape[0]>0:
        df_temp['A'] = [row['A']]* df_temp.shape[0]
        df_temp.loc[:,'time'] = [row['time']] * df_temp.shape[0]
        ar.append(df_temp)
df_res = pd.concat(ar)
Origin  Destination A   time
0   a   b   [a, b, c]   122
1   b   c   [a, b, c]   122
3   d   e   [a, d, e]   45
1   b   c   [b, c, e]   64
2   b   e   [b, c, e]   64
  • Related