Home > Software design >  how to merge dataframes based on column names
how to merge dataframes based on column names

Time:11-23

I would like to merge 2 dataframes with different index column names. I have df1 with the proper order of tickers that i want and would like to merge df2 witch the same tickers but in a different order and additional information.

so my df1 looks like:

df1

index XOO TSLA FORD YYX AAPL
avg 24 51 25 35 34
std 3 3 4 5 2
min 1 2 4 5 6
max 78 78 97 89 97

df2

Index 10 11 62 454 988
ticker AAPL FORD TSLA YYX XOO
sector Tech Auto Tech ETF ETF

My df2 has its information imported from an excel so the column names are its index numbers which I do not want. The ticker names are also in the wrong order and I need the sectors to match the tickers. The dataframe Id like to create looks like:

df3

index XOO TSLA FORD YYX AAPL
avg 24 51 25 35 34
std 3 3 4 5 2
min 1 2 4 5 6
max 78 78 97 89 97
sector ETF Tech Auto ETF Tech

I have tried to set the index using

Industry = Industry.set_index('Index').T.set_index('ticker').T

but I get an error that says "None of ['Index'] are in the columns"

I have also tried:

df3 = pd.concat([df1,df2])

but it creates a dataframe with df2 not matching the tickers and instead put it next to df1.

CodePudding user response:

import pandas as pd

# Import the tables you provided, which I saved as CSVs
df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv', header=1)

# Rename the "ticker" column so that all it is the same as the corresponding column name in df1
df2.rename(columns={'ticker': 'index'}, inplace=True)

# Concatenate
df3 = pd.concat([df1, df2])
df3

Two changes to make:

  1. Make sure you're selecting the correct row when you import df2. Use header=1 if the column names are in row 1 of your CSV.
  2. Rename the "ticker" column to "index" (so that it matches df1). That should do the trick, and the merge should work correctly.
  • Related