Home > Back-end >  Create two new data frames from two existing data frames, taking into account all combinations of th
Create two new data frames from two existing data frames, taking into account all combinations of th

Time:11-17

I want to create two new data frames from two existing data frames, taking into account all combinations of their multi-index columns and their values.

left = pd.DataFrame(([1,0,1], [1,0,0],[1,0,1],), columns=pd.MultiIndex.from_tuples([('A', 'B'), ('A', 'C' ), ('D','D'),]))
right = pd.DataFrame(([8,9], [8,9],[8,9],), columns=pd.MultiIndex.from_tuples([('S', 'T'), ('Y', 'Z'),]))
left:
   A  A  D
   B  C  D
0  1  0  1
1  1  0  0
2  1  0  1
right:
   S  Y
   T  Z
0  8  9
1  8  9
2  8  9
expected output
left_result:
   A  A  A  A  D  D
   B  B  C  C  D  D
0  1  1  0  0  1  1
1  1  1  0  0  0  0
2  1  1  0  0  1  1
right_result:
   S  Y  S  Y  S  Y
   T  Z  T  Z  T  Z
0  8  9  8  9  8  9
1  8  9  8  9  8  9
2  8  9  8  9  8  9

PS: Sorry if my question is not precise enough, but the example with the expected result still explains it best without words. Perhaps the question has already been asked, but in the absence of a precise description of what kind of result I actually want (combination in two DataFrames?), I ask you to help formulate the question more precisely or to link it as a duplicate if an existing question is present.

CodePudding user response:

We can use MultiIndex.from_product based on the flat index of each DataFrame then reindex based on the corresponding level values:

midx = pd.MultiIndex.from_product([left.columns.to_flat_index(),
                                   right.columns.to_flat_index()])

left = left.reindex(columns=midx.get_level_values(0))
right = right.reindex(columns=midx.get_level_values(1))

left:

   A           D   
   B  B  C  C  D  D
0  1  1  0  0  1  1
1  1  1  0  0  0  0
2  1  1  0  0  1  1

right:

   S  Y  S  Y  S  Y
   T  Z  T  Z  T  Z
0  8  9  8  9  8  9
1  8  9  8  9  8  9
2  8  9  8  9  8  9

midx for reference:

MultiIndex([(('A', 'B'), ('S', 'T')),
            (('A', 'B'), ('Y', 'Z')),
            (('A', 'C'), ('S', 'T')),
            (('A', 'C'), ('Y', 'Z')),
            (('D', 'D'), ('S', 'T')),
            (('D', 'D'), ('Y', 'Z'))],
           )

Note: level 0 are the left flat index values and level 1 are the right flat index values.

CodePudding user response:

Use DataFrame.reindex Index.repeat on the columns (axis=1)

# repeat 'left' columns 2 times 
>>> left = left.reindex(left.columns.repeat(2), axis=1)
>>> left

   A           D   
   B  B  C  C  D  D
0  1  1  0  0  1  1
1  1  1  0  0  0  0
2  1  1  0  0  1  1

# repeat 'right' columns 3 times 
>>> right = right.reindex(right.columns.repeat(3), axis=1)
>>> right 

   S        Y      
   T  T  T  Z  Z  Z
0  8  8  8  9  9  9
1  8  8  8  9  9  9
2  8  8  8  9  9  9

Or pandas.concat

# concatenate 'right' 3 times side by side (axis=1)
>>> right = pd.concat([right]*3, axis=1)
>>> right

   S  Y  S  Y  S  Y
   T  Z  T  Z  T  Z
0  8  9  8  9  8  9
1  8  9  8  9  8  9
2  8  9  8  9  8  9

  • Related