I've the following panda data:
df = {'ID_1': [1,1,1,2,2,3,4,4,4,4],
'ID_2': ['a', 'b', 'c', 'f', 'g', 'd', 'v', 'x', 'y', 'z']
}
df = pd.DataFrame(df)
display(df)
ID_1 ID_2
1 a
1 b
1 c
2 f
2 g
3 d
4 v
4 x
4 y
4 z
For each ID_1
, I need to find the combination (order doesn't matter) of ID_2
. For example,
When ID_1
= 1, the combinations are ab, ac, bc
.
When ID_1
= 2, the combination is fg
.
Note, if the frequency of ID_1
<2, then there is no combination here (see ID_1
=3, for example).
Finally, I need to store the combination results in df2
as follows:
CodePudding user response:
One way using itertools.combinations
:
from itertools import combinations
def comb_df(ser):
return pd.DataFrame(list(combinations(ser, 2)), columns=["from", "to"])
new_df = df.groupby("ID_1")["ID_2"].apply(comb_df).reset_index(drop=True)
Output:
from to
0 a b
1 a c
2 b c
3 f g
4 v x
5 v y
6 v z
7 x y
8 x z
9 y z