Home > Blockchain >  Pandas - Merging Two Data frames with different index names but same amount of Columns
Pandas - Merging Two Data frames with different index names but same amount of Columns

Time:02-16

I'm trying to merge together two Pandas data frames but not able to get the desired result. One data frame has been transposed, and I think for this reason has a header row containing the column indexes. The other has a header row with column names, however each have the same number of columns.

Data Frame 1 (DF1):

 ----------------------------------------------------- 
|      NAME      RD1     RD2     RD3     RD4     RD5  |
 ----------------------------------------------------- 
| 0     Tom      4       2       1       -       3    |
| 1     Mark     2       2       3       2       1    |
| 2     James    -       1       5       5       4    |
| ...   ...      ...     ...     ...     ...     ...  |
| 30    Ted      2       3       3       3       2    |
 ----------------------------------------------------- 

Data Frame 2 (DF2)

 -------------------------------------------------------------------- 
|       0      1          2          3          4          5         |
 -------------------------------------------------------------------- 
| 0     Desc   Round 1    Round 2    Round 3    Round 4    Round 5   |
| 1     Opp    Hawks      Dolphins   Raptors    Sharks     Bears     |
| 2     Date   02/04      08/04      16/04      24/04      30/04     |
| 3     Venue  MELB       PERTH      MELB       SYD        MELB      |
 -------------------------------------------------------------------- 

I've tried using multiple pandas methods (join / append / merge / concat) to join the two data frames without adding any columns. However each method seems to add up both the columns and rows together. I want to be able to do an inner join but I can seem to find a way without having column index numbers. Is there a way I can add a top row to DF1 with the column index numbers the perform an inner join of the two DFs? I have also looked for ways to convert DF into headers instead with no luck as of yet.

The desired final DF would look like:

 -------------------------------------------------------------------- 
|       Desc   Round 1    Round 2    Round 3    Round 4    Round 5   |
|       Opp    Hawks      Dolphins   Raptors    Sharks     Bears     |
|       Date   02/04      08/04      16/04      24/04      30/04     |
|       Venue  MELB       PERTH      MELB       SYD        MELB      |
 -------------------------------------------------------------------- 
| 0     Tom    4          2          1          -          3         |
| 1     Mark   2          2          3          2          1         |
| 2     James  -          1          5          5          4         |
| ...   ...    ...        ...        ...        ...        ...       |
| 30    Ted    2          3          3          3          2         |
 -------------------------------------------------------------------- 

CodePudding user response:

I think you need MultiIndex by another DataFrame by MultiIndex.from_frame with transposing:

DF1.columns = pd.MultiIndex.from_frame(DF2.T, names= [None] * len(DF2))
print (DF1)
     Desc Round 1  Round 2 Round 3 Round 4 Round 5
      Opp   Hawks Dolphins Raptors  Sharks   Bears
     Date   02/04    08/04   16/04   24/04   30/04
    Venue    MELB    PERTH    MELB     SYD    MELB
0     Tom       4        2       1       -       3
1    Mark       2        2       3       2       1
2   James       -        1       5       5       4
30    Ted       2        3       3       3       2

CodePudding user response:

You can use pd.MultiIndex.from_arrays:

df1.columns = pd.MultiIndex.from_arrays(df2.values)
print(df1)

# Output
     Desc Round 1  Round 2 Round 3 Round 4 Round 5
      Opp   Hawks Dolphins Raptors  Sharks   Bears
     Date   02/04    08/04   16/04   24/04   30/04
    Venue    MELB    PERTH    MELB     SYD    MELB
0     Tom       4        2       1       -       3
1    Mark       2        2       3       2       1
2   James       -        1       5       5       4
30    Ted       2        3       3       3       2

You can also use a comprehension to create a tuple of tuple usable as a MultiIndex:

df1.columns = tuple(tuple(l) for l in df2.T.values)
print(df1)

# Output
     Desc Round 1  Round 2 Round 3 Round 4 Round 5
      Opp   Hawks Dolphins Raptors  Sharks   Bears
     Date   02/04    08/04   16/04   24/04   30/04
    Venue    MELB    PERTH    MELB     SYD    MELB
0     Tom       4        2       1       -       3
1    Mark       2        2       3       2       1
2   James       -        1       5       5       4
30    Ted       2        3       3       3       2
  • Related