Home > Software engineering >  How to get all unique combinations of values in one column that are in another column
How to get all unique combinations of values in one column that are in another column

Time:12-31

Starting with a dataframe like this:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'b', 'b', 'a']})
   A  B
0  1  a
1  2  b
2  3  b
3  4  b
4  5  a

What is the best way of getting to a dataframe like this?

pd.DataFrame({'source': [1, 2, 2, 3], 'target': [5, 3, 4, 4]})
   source  target
0       1       5
1       2       3
2       2       4
3       3       4

For each time a row in column A has the same value in column B as another row in column A, I want to save the unique instances of that relationship in a new dataframe.

This is pretty close:

df.groupby('B')['A'].unique()
B
a       [1, 5]
b    [2, 3, 4]
Name: A, dtype: object

But I'd ideally convert it into a single dataframe now and my brain has gone kaput.

CodePudding user response:

In your case , you can do itertools.combinations

import itertools
s = df.groupby('B')['A'].apply(lambda x : set(list(itertools.combinations(x, 2)))).explode().tolist()
out = pd.DataFrame(s,columns=['source','target'])
out
Out[312]: 
   source  target
0       1       5
1       3       4
2       2       3
3       2       4

CodePudding user response:

use merge function

df.merge(df, how = "outer", on = ["B"]).query("A_x < A_y")
  • Related