Home > Software engineering >  Data Transforming/formatting in Python
Data Transforming/formatting in Python

Time:03-21

I've the following panda data:

df = {'ID_1': [1,1,1,2,2,3,4,4,4,4],
      'ID_2': ['a', 'b', 'c', 'f', 'g', 'd', 'v', 'x', 'y', 'z']
     }
df = pd.DataFrame(df)
display(df)

ID_1    ID_2
1   a
1   b
1   c
2   f
2   g
3   d
4   v
4   x
4   y
4   z

For each ID_1, I need to find the combination (order doesn't matter) of ID_2. For example,

When ID_1 = 1, the combinations are ab, ac, bc. When ID_1 = 2, the combination is fg.

Note, if the frequency of ID_1<2, then there is no combination here (see ID_1=3, for example).

Finally, I need to store the combination results in df2 as follows:

enter image description here

CodePudding user response:

One way using itertools.combinations:

from itertools import combinations

def comb_df(ser):
    return pd.DataFrame(list(combinations(ser, 2)), columns=["from", "to"])

new_df = df.groupby("ID_1")["ID_2"].apply(comb_df).reset_index(drop=True)

Output:

  from to
0    a  b
1    a  c
2    b  c
3    f  g
4    v  x
5    v  y
6    v  z
7    x  y
8    x  z
9    y  z
  • Related