Have two DataFrames with identical columns labels. Columns are label, data1, data2, ..., dataN
.
Need to take the product of the DataFrames, multiplying data1 * data1, data2 * data2, etc for every possible combination of rows in DataFrame1 with the rows in DataFrame2. As such, want the resulting DataFrame to maintain the label column of both frames in some way
Example:
Frame 1:
label | d1 | d2 | d3 |
---|---|---|---|
a | 1 | 2 | 3 |
b | 4 | 5 | 6 |
Frame 2:
label | d1 | d2 | d3 |
---|---|---|---|
c | 7 | 8 | 9 |
d | 10 | 11 | 12 |
Result:
label_1 | label_2 | d1 | d2 | d3 |
---|---|---|---|---|
a | c | 7 | 16 | 27 |
a | d | 10 | 22 | 36 |
b | c | 28 | 40 | 54 |
b | d | 40 | 55 | 72 |
I feel like there is a nice way to do this, but all I can come up with is gross loops with lots of memory reallocation.
CodePudding user response:
Let's do a cross merge first then mutiple the dn_x
with dn_y
out = df1.merge(df2, how='cross')
out = (out.filter(like='label')
.join(out.filter(regex='d.*_x')
.mul(out.filter(regex='d.*_y').values)
.rename(columns=lambda col: col.split('_')[0])))
print(out)
label_x label_y d1 d2 d3
0 a c 7 16 27
1 a d 10 22 36
2 b c 28 40 54
3 b d 40 55 72
CodePudding user response:
first idea with DataFrame.reindex
and MultiIndex created by MultiIndex.from_product
:
mux = pd.MultiIndex.from_product([df1['label'], df2['label']])
df = (df1.set_index('label').reindex(mux, level=0)
.mul(df2.set_index('label').reindex(mux, level=1))
.rename_axis(['label1','label2'])
.reset_index())
print (df)
label1 label2 d1 d2 d3
0 a c 7 16 27
1 a d 10 22 36
2 b c 28 40 54
3 b d 40 55 72
Or solution with cross join:
df = (df1.rename(columns={'label':'label1'})
.merge(df2.rename(columns={'label':'label2'}),
how='cross',
suffixes=('_','')))
For multiple columns get cols ends by _
and multiple same columns without _
, last drop columns cols
:
cols = df.filter(regex='_$').columns
no_ = cols.str.rstrip('_')
df[no_] *= df[cols].to_numpy()
df = df.drop(cols, axis=1)
print (df)
label1 label2 d1 d2 d3
0 a c 7 16 27
1 a d 10 22 36
2 b c 28 40 54
3 b d 40 55 72