I know there are a lot of resources on merging two Pandas df but I am trying to merge one df based on the IDs of the second one but I need to make new columns out of the rows of the second df. Its a bit confusing but I have an example here that might clarify what I am trying to do.
What I have:
dfa = pd.DataFrame({"ID": ["1", "2", "3"],"Color":["Red", "White", "Blue"],"Length":["16", "14.97", "22.75"]})
dfb = pd.DataFrame({"ID": ["1", "1", "2","3"],"Col1":["Color", "Width", "Length","Color"],"Value":["Blue", "14.97", "22.75","Green"]})
What I want:
dfc = pd.DataFrame({"ID": ["1", "2", "3"],"Color":["Blue", "White", "Green"],"Length":["16", "14.97", "22.75"],"c:Color":["Blue","NaN","Green"],"c:Width":["14.97","NaN","NaN"],"c:Length":["NaN","22.75","NaN"]})
Any help would be appreciated!
CodePudding user response:
Use pivot
before merge
:
>>> dfa.merge(dfb.pivot('ID', 'Col1', 'Value').add_prefix('c:'), on='ID')
ID Color Length c:Color c:Length c:Width
0 1 Red 16 Blue NaN 14.97
1 2 White 14.97 NaN 22.75 NaN
2 3 Blue 22.75 Green NaN NaN
To get 'exactly' your output:
>>> dfa.merge(dfb.pivot('ID', 'Col1', 'Value')[dfb['Col1'].unique()].add_prefix('c:'), on='ID')
ID Color Length c:Color c:Width c:Length
0 1 Red 16 Blue 14.97 NaN
1 2 White 14.97 NaN NaN 22.75
2 3 Blue 22.75 Green NaN NaN
CodePudding user response:
Need to convert to wide before joining:
dfa.merge(
dfb.pivot(
index='ID',
columns='Col1',
values='Value'
).add_prefix('c:'),
on = 'ID'
)