Filter data based on conditions from a dataframe-CodePudding

I have two dataframe(s): df1:

  | from id | from group | to id | to group   |
  |    1    |      A     |   3   |      B     |
  |    4    |      B     |   4   |      X     | 
  |    5    |      F     |   5   |      J     |
  |    2    |      B     |   3   |      A     |

df2:

   | From | To |
   |   A  |  B |
   |   F  |  J |

I want to filter out the values in df2, if present in the 'from group' and 'to group' columns of df1

Expected output:

     | from id | from group | to id | to group   |
     |    4    |      B     |   4   |      X     | 
     |    2    |      B     |   3   |      A     |

I am looking for a flexible solution. A solution that would not change if there are changes to the values in the df2 columns.

CodePudding user response：

You can use .merge with indicator=True, then filter the df1:

x = df1[
    df1.merge(
        df2,
        left_on=["from group", "to group"],
        right_on=["From", "To"],
        indicator=True,
        how="left",
    )._merge.eq("left_only")
]
print(x)

Prints:

   from id from group  to id to group
1        4          B      4        X
3        2          B      3        A

CodePudding user response：

df_1 = pd.DataFrame(data={'from id': [1,4,5,2],
                        'from group': ['A', 'B', 'F', 'B'],
                        'to id': [3,4,5,3],
                        'to group': ['B', 'X', 'J', 'A']})
cols = list(df_1)

df_2 = pd.DataFrame(data={'From': ['A', 'F'], 'To': ['B', 'J']})

print(df_1)
print('-----')
print(df_2)

df_1 = df_1.merge(df_2, how='left', left_on=['from group', 'to group'], right_on=['From', 'To'])
df_1 = df_1[df_1['From'].isnull()][cols]
print(df_1)

CodePudding user response：

Quick edit to the answer I provided to your previous iteration of this question.

l1 = df2.values.tolist()
l2 = list(itertools.chain.from_iterable([itertools.permutations(i, r=2) for i in l1]))


df[[j not in l2 for j in list(zip(df['from group'], df['to group']))]]