I have 3 tables of following form:
import pandas as pd
df1 = pd.DataFrame({'ISIN': [1, 4, 7, 10],
'Value1': [2012, 2014, 2013, 2014],
'Value2': [55, 40, 84, 31]})
df1 = df1.set_index("ISIN")
df2 = pd.DataFrame({'ISIN': [1, 4, 7, 10],
'Symbol': ['a', 'b', 'c', 'd']})
df2 = df2.set_index("ISIN")
df3 = pd.DataFrame({'Symbol': ['a', 'b', 'c', 'd'],
'01.01.2020': [1, 2, 3, 4],
'01.01.2021': [3,2,3,2]})
df3 = df3.set_index("Symbol")
My aim now is to merge all 3 tabels together. I would go the following way:
Step1 (merge df1 and df2):
result1 = pd.merge(df1, df2, on=["ISIN"])
print(result1)
The result is ok and gives me the table:
Value1 Value2 Symbol
ISIN
1 2012 55 a
4 2014 40 b
7 2013 84 c
10 2014 31 d
In next step I want to merge it with df3, so I did make a step between and merge df2 and df3:
print(result1)
result2 = pd.merge(df2, df3, on=["Symbol"])
print(result2)
My problem now, the output is:
Symbol 01.01.2020 01.01.2021
0 a 1 3
1 b 2 2
2 c 3 3
3 d 4 2
the column ISIN here is lost. And the step
result = pd.merge(result, result2, on=["ISIN"])
result.set_index("ISIN")
produces an error.
Is there an elegant way to merge this 3 tabels together (with key column ISIN) and why is the key column lost in the second merge process?
CodePudding user response:
Just chain the merge
operations:
result = df1.merge(df2.reset_index(), on='ISIN').merge(df3, on='Symbol')
Or using your syntax, use result1
as source for the second merge:
result1 = pd.merge(df1, df2.reset_index(), on=["ISIN"])
result2 = pd.merge(result1, df3, on=["Symbol"])
output:
ISIN Value1 Value2 Symbol 01.01.2020 01.01.2021
0 1 2012 55 a 1 3
1 4 2014 40 b 2 2
2 7 2013 84 c 3 3
3 10 2014 31 d 4 2
CodePudding user response:
You should not set the index prior to joining if you wish to keep it as part of the data in your dataframe. I suggest first merging, then setting the index to your desired value. In a single line:
output = df1.merge(df2,on='ISIN').merge(df3,on='Symbol')
Outputs:
ISIN Value1 Value2 Symbol 01.01.2020 01.01.2021
0 1 2012 55 a 1 3
1 4 2014 40 b 2 2
2 7 2013 84 c 3 3
3 10 2014 31 d 4 2
You can now set the index to ISIN
by adding .set_index('ISIN')
to output:
Value1 Value2 Symbol 01.01.2020 01.01.2021
ISIN
1 2012 55 a 1 3
4 2014 40 b 2 2
7 2013 84 c 3 3
10 2014 31 d 4 2