Home > database >  How do I create a table to match based on different columns values ? If statements?
How do I create a table to match based on different columns values ? If statements?

Time:03-16

I have a dataset and I am looking to see if there is a way to match data based on col values.

  col-A    col-B      
  Apple    squash     
  Apple    lettuce    
  Banana   Carrot     
  Banana   Carrot 
  Banana   Carrot
  dragon   turnip 
  melon    potato
  melon    potato
  pear     potato   

Match

  • if col A matches another col a and col b doesn't match
  • if col B matches another col B and col a doesn't match
col-A     col-B      
Apple    squash     
Apple    lettuce 
melon    potato
melon    potato
 pear    potato   

edit fixed typo

edit2 fixed 2nd typo

CodePudding user response:

IIUC, you need to compute two masks to identify which group has a unique match with the other values:

m1 = df.groupby('col-B')['col-A'].transform('nunique').gt(1)
m2 = df.groupby('col-A')['col-B'].transform('nunique').gt(1)

out = df[m1|m2]

Output:

   col-A    col-B
0  Apple   squash
1  Apple  lettuce
6  melon   potato
7  melon   potato
8   pear   potato

You can also get the unique/exclusive pairs with:

df[~(m1|m2)]

    col-A    col-B
2  Banana   Carrot
3  Banana   Carrot
4  Banana   Carrot
5    Pear  Cabbage

CodePudding user response:

So, if I understand well, you want to select each rows, such that grouping for colA (resp. colB) then colB (resp. colA) lead to more than one group.

I can advice :

grA = df2.groupby("colA").filter(lambda x : x.groupby("colB").ngroups > 1)
grB = df2.groupby("colB").filter(lambda x : x.groupby("colA").ngroups > 1)

Leading to :

grA
    colA     colB
0  Apple   squash
1  Apple  lettuce

and

grB
    colA    colB
6  melon  potato
7  melon  potato
8   pear  potato

Merging the two dataframes will lead to the desired ouput.

  • Related