Home > Enterprise >  Merge two dataframes based on rows
Merge two dataframes based on rows

Time:12-18

I know there are a lot of resources on merging two Pandas df but I am trying to merge one df based on the IDs of the second one but I need to make new columns out of the rows of the second df. Its a bit confusing but I have an example here that might clarify what I am trying to do.

What I have:

dfa = pd.DataFrame({"ID": ["1", "2", "3"],"Color":["Red", "White", "Blue"],"Length":["16", "14.97", "22.75"]})

dfb = pd.DataFrame({"ID": ["1", "1", "2","3"],"Col1":["Color", "Width", "Length","Color"],"Value":["Blue", "14.97", "22.75","Green"]})

What I want:

dfc = pd.DataFrame({"ID": ["1", "2", "3"],"Color":["Blue", "White", "Green"],"Length":["16", "14.97", "22.75"],"c:Color":["Blue","NaN","Green"],"c:Width":["14.97","NaN","NaN"],"c:Length":["NaN","22.75","NaN"]})

Any help would be appreciated!

CodePudding user response:

Use pivot before merge:

>>> dfa.merge(dfb.pivot('ID', 'Col1', 'Value').add_prefix('c:'), on='ID')

  ID  Color Length c:Color c:Length c:Width
0  1    Red     16    Blue      NaN   14.97
1  2  White  14.97     NaN    22.75     NaN
2  3   Blue  22.75   Green      NaN     NaN

To get 'exactly' your output:

>>> dfa.merge(dfb.pivot('ID', 'Col1', 'Value')[dfb['Col1'].unique()].add_prefix('c:'), on='ID')
  ID  Color Length c:Color c:Width c:Length
0  1    Red     16    Blue   14.97      NaN
1  2  White  14.97     NaN     NaN    22.75
2  3   Blue  22.75   Green     NaN      NaN

CodePudding user response:

Need to convert to wide before joining:

dfa.merge(
    dfb.pivot(
        index='ID', 
        columns='Col1', 
        values='Value'
        ).add_prefix('c:'),
    on = 'ID'
    )
  • Related