How do I create a table to match based on different columns values ? If statements?-CodePudding

I have a dataset and I am looking to see if there is a way to match data based on col values.

  col-A    col-B      
  Apple    squash     
  Apple    lettuce    
  Banana   Carrot     
  Banana   Carrot 
  Banana   Carrot
  dragon   turnip 
  melon    potato
  melon    potato
  pear     potato

Match

if col A matches another col a and col b doesn't match
if col B matches another col B and col a doesn't match

col-A     col-B      
Apple    squash     
Apple    lettuce 
melon    potato
melon    potato
 pear    potato

edit fixed typo

edit2 fixed 2nd typo

CodePudding user response：

IIUC, you need to compute two masks to identify which group has a unique match with the other values:

m1 = df.groupby('col-B')['col-A'].transform('nunique').gt(1)
m2 = df.groupby('col-A')['col-B'].transform('nunique').gt(1)

out = df[m1|m2]

Output:

   col-A    col-B
0  Apple   squash
1  Apple  lettuce
6  melon   potato
7  melon   potato
8   pear   potato

You can also get the unique/exclusive pairs with:

df[~(m1|m2)]

    col-A    col-B
2  Banana   Carrot
3  Banana   Carrot
4  Banana   Carrot
5    Pear  Cabbage

CodePudding user response：

So, if I understand well, you want to select each rows, such that grouping for colA (resp. colB) then colB (resp. colA) lead to more than one group.

I can advice :

grA = df2.groupby("colA").filter(lambda x : x.groupby("colB").ngroups > 1)
grB = df2.groupby("colB").filter(lambda x : x.groupby("colA").ngroups > 1)

Leading to :

grA
    colA     colB
0  Apple   squash
1  Apple  lettuce

and

grB
    colA    colB
6  melon  potato
7  melon  potato
8   pear  potato

Merging the two dataframes will lead to the desired ouput.