Home > Software design >  merge on two keys from two dataframes with missing rows
merge on two keys from two dataframes with missing rows

Time:11-30

I have two sets of data that represent the same information with the same key but with different names of the key. I want to map the keys with 1-1 relationship with inner join (because they are the same ID but reading from different company websites). Here is an example:

Dataset A

enter image description here

Dataset B (note that row for PO d is missing for ID4):

enter image description here

Want to map the relationship between "Key" and "ID":

enter image description here

So that the data set will be connected with this inner joint as:

enter image description here

How could I do this using python/pandas?

CodePudding user response:

Doing chain merge

dfa.merge(dfb.merge(dfc, how='inner'), how='inner')

CodePudding user response:

  1. merge dfA and dfB (on "PO Number") to get the relationship between "Key" and "ID"
  2. merge dfA and mapper (on "Key") to get the output
mapper = dfA.merge(dfB)[["Key", "ID"]].drop_duplicates()
output = dfA.merge(mapper)

>>> output
     Key PO Number    ID
0  Key 1         a  ID 4
1  Key 1         b  ID 4
2  Key 1         c  ID 4
3  Key 1         d  ID 4
4  Key 2         e  ID 1
5  Key 3         1  ID 3
6  Key 3         2  ID 3
7  Key 3         3  ID 3
8  Key 3         4  ID 3
9  Key 3         5  ID 3

Or in one line:

output = dfA.merge(dfA.merge(dfB)[["Key", "ID"]].drop_duplicates())
  • Related