Home > Mobile >  select rows if their elements form all possible pairs between two columns
select rows if their elements form all possible pairs between two columns

Time:11-18

df=pd.DataFrame({'col1':['a','a','b','b','c'],'col2':['x','y','x','y','x']})

I'd like to select those rows that for each letter in col1 forms all possible pairs with the letters of col2. In this example that should be

  col1 col2
0    a    x
1    a    y
2    b    x
3    b    y

because the pair [c y] is missing.

CodePudding user response:

Try crosstab to find combinations, check for existence with all, then use isin to slice:

s = pd.crosstab(df.col1, df.col2).eq(1).all(1)
df.loc[df['col1'].isin(s[s].index)]

Output:

  col1 col2
0    a    x
1    a    y
2    b    x
3    b    y

CodePudding user response:

@Quang Hoang show how to do it in pure Pandas but going back to 'plain' Python seams easier (for me in any case). However the proposed solution calls in end to query to filter the dataframe.

from collections import defaultdict

df=pd.DataFrame({'col1':['a','a','b','b','c'],'col2':['x','y','x','y','x']})

l1, l2 = list(df['col1']), list(df['col2']) # extract from dataframe
l2set = set(l2)                             # set of col2 values

# build dictionary giving for col1 value, the list of col2 values
d = defaultdict(list)                                          
for e1, e2 in zip(l1, l2):
    d[e1].append(e2)

# get the list of acceptable values for which the set of col2 values is equal to the set of all col2 values
accept = [k for k,v in d.items() if set(v) == l2set]

#use query to filter data frame for 'accept' list
df2 = df.query('col1 in @accept')

DF2:

  col1 col2
0    a    x
1    a    y
2    b    x
3    b    y

CodePudding user response:

For this specific question, you can find the number of unique values (which should be two in this case), filter for rows that are equal to 2, and exclude the rest:

df.loc[df.groupby('col1').col2.transform('nunique') == 2]

  col1 col2
0    a    x
1    a    y
2    b    x
3    b    y
  • Related