I'm trying to merge together two Pandas data frames but not able to get the desired result. One data frame has been transposed, and I think for this reason has a header row containing the column indexes. The other has a header row with column names, however each have the same number of columns.
Data Frame 1 (DF1):
-----------------------------------------------------
| NAME RD1 RD2 RD3 RD4 RD5 |
-----------------------------------------------------
| 0 Tom 4 2 1 - 3 |
| 1 Mark 2 2 3 2 1 |
| 2 James - 1 5 5 4 |
| ... ... ... ... ... ... ... |
| 30 Ted 2 3 3 3 2 |
-----------------------------------------------------
Data Frame 2 (DF2)
--------------------------------------------------------------------
| 0 1 2 3 4 5 |
--------------------------------------------------------------------
| 0 Desc Round 1 Round 2 Round 3 Round 4 Round 5 |
| 1 Opp Hawks Dolphins Raptors Sharks Bears |
| 2 Date 02/04 08/04 16/04 24/04 30/04 |
| 3 Venue MELB PERTH MELB SYD MELB |
--------------------------------------------------------------------
I've tried using multiple pandas methods (join / append / merge / concat) to join the two data frames without adding any columns. However each method seems to add up both the columns and rows together. I want to be able to do an inner join but I can seem to find a way without having column index numbers. Is there a way I can add a top row to DF1 with the column index numbers the perform an inner join of the two DFs? I have also looked for ways to convert DF into headers instead with no luck as of yet.
The desired final DF would look like:
--------------------------------------------------------------------
| Desc Round 1 Round 2 Round 3 Round 4 Round 5 |
| Opp Hawks Dolphins Raptors Sharks Bears |
| Date 02/04 08/04 16/04 24/04 30/04 |
| Venue MELB PERTH MELB SYD MELB |
--------------------------------------------------------------------
| 0 Tom 4 2 1 - 3 |
| 1 Mark 2 2 3 2 1 |
| 2 James - 1 5 5 4 |
| ... ... ... ... ... ... ... |
| 30 Ted 2 3 3 3 2 |
--------------------------------------------------------------------
CodePudding user response:
I think you need MultiIndex
by another DataFrame by MultiIndex.from_frame
with transposing:
DF1.columns = pd.MultiIndex.from_frame(DF2.T, names= [None] * len(DF2))
print (DF1)
Desc Round 1 Round 2 Round 3 Round 4 Round 5
Opp Hawks Dolphins Raptors Sharks Bears
Date 02/04 08/04 16/04 24/04 30/04
Venue MELB PERTH MELB SYD MELB
0 Tom 4 2 1 - 3
1 Mark 2 2 3 2 1
2 James - 1 5 5 4
30 Ted 2 3 3 3 2
CodePudding user response:
You can use pd.MultiIndex.from_arrays
:
df1.columns = pd.MultiIndex.from_arrays(df2.values)
print(df1)
# Output
Desc Round 1 Round 2 Round 3 Round 4 Round 5
Opp Hawks Dolphins Raptors Sharks Bears
Date 02/04 08/04 16/04 24/04 30/04
Venue MELB PERTH MELB SYD MELB
0 Tom 4 2 1 - 3
1 Mark 2 2 3 2 1
2 James - 1 5 5 4
30 Ted 2 3 3 3 2
You can also use a comprehension to create a tuple of tuple usable as a MultiIndex
:
df1.columns = tuple(tuple(l) for l in df2.T.values)
print(df1)
# Output
Desc Round 1 Round 2 Round 3 Round 4 Round 5
Opp Hawks Dolphins Raptors Sharks Bears
Date 02/04 08/04 16/04 24/04 30/04
Venue MELB PERTH MELB SYD MELB
0 Tom 4 2 1 - 3
1 Mark 2 2 3 2 1
2 James - 1 5 5 4
30 Ted 2 3 3 3 2