Home > Net >  Python: Pandas dataframe, merge/join tabels on different keys
Python: Pandas dataframe, merge/join tabels on different keys

Time:12-14

I have 3 tables of following form:

import pandas as pd

df1 = pd.DataFrame({'ISIN': [1, 4, 7, 10],
                   'Value1': [2012, 2014, 2013, 2014],
                   'Value2': [55, 40, 84, 31]})
df1 = df1.set_index("ISIN")

df2 = pd.DataFrame({'ISIN': [1, 4, 7, 10],
                   'Symbol': ['a', 'b', 'c', 'd']})
df2 = df2.set_index("ISIN")

df3 = pd.DataFrame({'Symbol': ['a', 'b', 'c', 'd'],
                    '01.01.2020': [1, 2, 3, 4],
                    '01.01.2021': [3,2,3,2]})
df3 = df3.set_index("Symbol")

My aim now is to merge all 3 tabels together. I would go the following way:

Step1 (merge df1 and df2):

result1 = pd.merge(df1, df2, on=["ISIN"])
print(result1)

The result is ok and gives me the table:

      Value1  Value2 Symbol
ISIN                       
1       2012      55      a
4       2014      40      b
7       2013      84      c
10      2014      31      d

In next step I want to merge it with df3, so I did make a step between and merge df2 and df3:

print(result1)
result2 = pd.merge(df2, df3, on=["Symbol"])
print(result2)

My problem now, the output is:

  Symbol  01.01.2020  01.01.2021
0      a           1           3
1      b           2           2
2      c           3           3
3      d           4           2

the column ISIN here is lost. And the step

result = pd.merge(result, result2, on=["ISIN"])
result.set_index("ISIN")

produces an error.

Is there an elegant way to merge this 3 tabels together (with key column ISIN) and why is the key column lost in the second merge process?

CodePudding user response:

Just chain the merge operations:

result = df1.merge(df2.reset_index(), on='ISIN').merge(df3, on='Symbol')

Or using your syntax, use result1 as source for the second merge:

result1 = pd.merge(df1, df2.reset_index(), on=["ISIN"])
result2 = pd.merge(result1, df3, on=["Symbol"])

output:

   ISIN  Value1  Value2 Symbol  01.01.2020  01.01.2021
0     1    2012      55      a           1           3
1     4    2014      40      b           2           2
2     7    2013      84      c           3           3
3    10    2014      31      d           4           2

CodePudding user response:

You should not set the index prior to joining if you wish to keep it as part of the data in your dataframe. I suggest first merging, then setting the index to your desired value. In a single line:

output = df1.merge(df2,on='ISIN').merge(df3,on='Symbol')

Outputs:

   ISIN  Value1  Value2 Symbol  01.01.2020  01.01.2021
0     1    2012      55      a           1           3
1     4    2014      40      b           2           2
2     7    2013      84      c           3           3
3    10    2014      31      d           4           2

You can now set the index to ISIN by adding .set_index('ISIN') to output:

      Value1  Value2 Symbol  01.01.2020  01.01.2021
ISIN                                               
1       2012      55      a           1           3
4       2014      40      b           2           2
7       2013      84      c           3           3
10      2014      31      d           4           2
  • Related