I have two large dataframes something like this:
df1:
A time
0 [a, b, c] 122
1 [a, d, e] 45
2 [b, c, e] 64
df2:
Origin Destination
0 a b
1 b c
2 b e
3 d e
Now I want to compare the two, so that the code checks if the combination of Origin and Destination from df2 appears in the lists in column A of df1. If this combination appears I want this combination added in new columns in df1. If this combination does not appear in a list nothing needs to happen.
Also if there is multiple combinations for 1 list I want to add another row so that there will be one row per combination. If I would compare df1 with df2, the new df1 needs to look something like this:
df1:
A time Origin Destination
0 [a, b, c] 122 a b
0 [a, b, c] 122 b c
1 [a, d, e] 45 d e
2 [b, c, e] 64 b c
2 [b, c, e] 64 b e
I have tried it with the code below:
for index, row in df1.iterrows():
row['Location'] = str(row['Location'])
for index1, row1 in df2.iterrows():
row1['Origin'] = str(row1['Origin'])
for index2, row2 in df2.iterrows():
row2['Destination'] = str(row1['Destination'])
if row1['Origin'] and row2['Destination'] in row['Location']:
df1['Origin'] = df2['Origin']
df1['Destination'] = df2['Destination']
But this gives the following output:
Location Time Origin Destination
0 [a, b, c] 122 a b
1 [a, d, e] 45 b c
2 [b, c, e] 64 b e
For some reason I dont understand this does not work because it gives the combination b and c in the second line while this is not there.
Also if there is multiple combinations in a single list this does not add new rows yet.
CodePudding user response:
ar = []
for i, row in df1.iterrows():
temp = row['A']
df_temp = (df2.loc[(df2['Origin'].isin(temp)) & (df2['Destination'].isin(temp))]).copy()
if df_temp.shape[0]>0:
df_temp['A'] = [row['A']]* df_temp.shape[0]
df_temp.loc[:,'time'] = [row['time']] * df_temp.shape[0]
ar.append(df_temp)
df_res = pd.concat(ar)
Origin Destination A time
0 a b [a, b, c] 122
1 b c [a, b, c] 122
3 d e [a, d, e] 45
1 b c [b, c, e] 64
2 b e [b, c, e] 64