I want to create two new data frames from two existing data frames, taking into account all combinations of their multi-index columns and their values.
left = pd.DataFrame(([1,0,1], [1,0,0],[1,0,1],), columns=pd.MultiIndex.from_tuples([('A', 'B'), ('A', 'C' ), ('D','D'),]))
right = pd.DataFrame(([8,9], [8,9],[8,9],), columns=pd.MultiIndex.from_tuples([('S', 'T'), ('Y', 'Z'),]))
left:
A A D
B C D
0 1 0 1
1 1 0 0
2 1 0 1
right:
S Y
T Z
0 8 9
1 8 9
2 8 9
expected output
left_result:
A A A A D D
B B C C D D
0 1 1 0 0 1 1
1 1 1 0 0 0 0
2 1 1 0 0 1 1
right_result:
S Y S Y S Y
T Z T Z T Z
0 8 9 8 9 8 9
1 8 9 8 9 8 9
2 8 9 8 9 8 9
PS: Sorry if my question is not precise enough, but the example with the expected result still explains it best without words. Perhaps the question has already been asked, but in the absence of a precise description of what kind of result I actually want (combination in two DataFrames?), I ask you to help formulate the question more precisely or to link it as a duplicate if an existing question is present.
CodePudding user response:
We can use MultiIndex.from_product
based on the flat index of each DataFrame then reindex
based on the corresponding level values:
midx = pd.MultiIndex.from_product([left.columns.to_flat_index(),
right.columns.to_flat_index()])
left = left.reindex(columns=midx.get_level_values(0))
right = right.reindex(columns=midx.get_level_values(1))
left
:
A D
B B C C D D
0 1 1 0 0 1 1
1 1 1 0 0 0 0
2 1 1 0 0 1 1
right
:
S Y S Y S Y
T Z T Z T Z
0 8 9 8 9 8 9
1 8 9 8 9 8 9
2 8 9 8 9 8 9
midx
for reference:
MultiIndex([(('A', 'B'), ('S', 'T')),
(('A', 'B'), ('Y', 'Z')),
(('A', 'C'), ('S', 'T')),
(('A', 'C'), ('Y', 'Z')),
(('D', 'D'), ('S', 'T')),
(('D', 'D'), ('Y', 'Z'))],
)
Note: level 0 are the left
flat index values and level 1 are the right
flat index values.
CodePudding user response:
Use DataFrame.reindex
Index.repeat
on the columns (axis=1
)
# repeat 'left' columns 2 times
>>> left = left.reindex(left.columns.repeat(2), axis=1)
>>> left
A D
B B C C D D
0 1 1 0 0 1 1
1 1 1 0 0 0 0
2 1 1 0 0 1 1
# repeat 'right' columns 3 times
>>> right = right.reindex(right.columns.repeat(3), axis=1)
>>> right
S Y
T T T Z Z Z
0 8 8 8 9 9 9
1 8 8 8 9 9 9
2 8 8 8 9 9 9
# concatenate 'right' 3 times side by side (axis=1)
>>> right = pd.concat([right]*3, axis=1)
>>> right
S Y S Y S Y
T Z T Z T Z
0 8 9 8 9 8 9
1 8 9 8 9 8 9
2 8 9 8 9 8 9