Starting with a dataframe like this:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'b', 'b', 'a']})
A B
0 1 a
1 2 b
2 3 b
3 4 b
4 5 a
What is the best way of getting to a dataframe like this?
pd.DataFrame({'source': [1, 2, 2, 3], 'target': [5, 3, 4, 4]})
source target
0 1 5
1 2 3
2 2 4
3 3 4
For each time a row in column A has the same value in column B as another row in column A, I want to save the unique instances of that relationship in a new dataframe.
This is pretty close:
df.groupby('B')['A'].unique()
B
a [1, 5]
b [2, 3, 4]
Name: A, dtype: object
But I'd ideally convert it into a single dataframe now and my brain has gone kaput.
CodePudding user response:
In your case , you can do itertools.combinations
import itertools
s = df.groupby('B')['A'].apply(lambda x : set(list(itertools.combinations(x, 2)))).explode().tolist()
out = pd.DataFrame(s,columns=['source','target'])
out
Out[312]:
source target
0 1 5
1 3 4
2 2 3
3 2 4
CodePudding user response:
use merge function
df.merge(df, how = "outer", on = ["B"]).query("A_x < A_y")