How to create all combinations of one column's items and choose one of the other columns each t-CodePudding

For instance, let's consider the following DataFrame:

    id  metric_a  metric_b
0   a     1         2
1   b     10        20
2   c     30        40

The resulting dataframe would consist of all the combinations of id, that is n² rows (square matrix).

In our case, since we have 3 unique ids, we would get 9 rows in total.

Now, given that each row is actually a pair x-y of ids, I'd like to have metric_a for x and metric_b for y, where x and y are simply the two ids of the given row.

To illustrate this:

    x   y  metric_a  metric_b
0   a   a    1         2
1   a   b    1         20
2   a   c    1         40
3   b   a    10        2
4   b   b    10        20
5   b   c    10        40
6   c   a    30        2
7   c   b    30        20
8   c   c    30        40

One way to achieve this is by first creating all possible combinations with itertools.product and then merging the initial dataframe two times. First time, to bring the metric for x and the second time to bring the metric for y.

Another way that came to my mind is:

# creating all the combinations of ids
pd.DataFrame(list(itertools.product(df['id'], df['id'])))

# creating all the combinations of metrics
pd.DataFrame(list(itertools.product(df['metric_a'], df['metric_b'])))

# some more code to concat those two horizontally..

However, I think that there should be a more elegant solution that I can't think of at the moment.

Also, could ideas around using MultiIndex.from_product and then reindexing work?

Any help is more than welcome!

CodePudding user response：

You can use a cross merge:

(df.drop(columns='metric_b')
   .merge(df.drop(columns='metric_a'), how='cross')
   .rename(columns=lambda x: x.replace('id_', ''))
)

Output:

   x  metric_a  y  metric_b
0  a         1  a         2
1  a         1  b        20
2  a         1  c        40
3  b        10  a         2
4  b        10  b        20
5  b        10  c        40
6  c        30  a         2
7  c        30  b        20
8  c        30  c        40