For instance, let's consider the following DataFrame:
id metric_a metric_b
0 a 1 2
1 b 10 20
2 c 30 40
The resulting dataframe would consist of all the combinations of id
, that is n2 rows (square matrix).
In our case, since we have 3 unique ids, we would get 9 rows in total.
Now, given that each row is actually a pair x-y
of ids, I'd like to have metric_a
for x
and metric_b
for y
, where x
and y
are simply the two ids of the given row.
To illustrate this:
x y metric_a metric_b
0 a a 1 2
1 a b 1 20
2 a c 1 40
3 b a 10 2
4 b b 10 20
5 b c 10 40
6 c a 30 2
7 c b 30 20
8 c c 30 40
One way to achieve this is by first creating all possible combinations with itertools.product
and then merging the initial dataframe two times.
First time, to bring the metric for x
and the second time to bring the metric for y
.
Another way that came to my mind is:
# creating all the combinations of ids
pd.DataFrame(list(itertools.product(df['id'], df['id'])))
# creating all the combinations of metrics
pd.DataFrame(list(itertools.product(df['metric_a'], df['metric_b'])))
# some more code to concat those two horizontally..
However, I think that there should be a more elegant solution that I can't think of at the moment.
Also, could ideas around using MultiIndex.from_product
and then reindexing work?
Any help is more than welcome!
CodePudding user response:
You can use a cross merge
:
(df.drop(columns='metric_b')
.merge(df.drop(columns='metric_a'), how='cross')
.rename(columns=lambda x: x.replace('id_', ''))
)
Output:
x metric_a y metric_b
0 a 1 a 2
1 a 1 b 20
2 a 1 c 40
3 b 10 a 2
4 b 10 b 20
5 b 10 c 40
6 c 30 a 2
7 c 30 b 20
8 c 30 c 40