I have a dataframe like :
ItemName Category
item1 cat1
item2 cat1
item3 cat1
item4 cat1
item5 cat2
item6 cat2
and one like:
CategoryName Element
cat1 element1
cat2 element2
I want to merge both dataframes into one, based on the category column, but now I have two columns having the categories. This would be the ideal result:
ItemName Category Element
item1 cat1 element1
item2 cat1 element1
item3 cat1 element1
item4 cat1 element1
item5 cat2 element2
item6 cat2 element2
But I have the result with duplicate of column Category and CategoryName. How to solve this duplicate issue when merging the dataframes? What I did for the merge:
result_df= df1.merge(
df2,
left_on='Category',
right_on='CategoryName',
how='inner')
CodePudding user response:
Unfortunately you would need to rename the columns:
>>> df1.merge(df2.rename({'CategoryName': 'Category'}, axis=1), on='Category')
ItemName Category Element
0 item1 cat1 element1
1 item2 cat1 element1
2 item3 cat1 element1
3 item4 cat1 element1
4 item5 cat2 element2
5 item6 cat2 element2
>>>
Or you have to drop it:
>>> df1.merge(df2, left_on='Category', right_on='CategoryName').drop('CategoryName', axis=1)
ItemName Category Element
0 item1 cat1 element1
1 item2 cat1 element1
2 item3 cat1 element1
3 item4 cat1 element1
4 item5 cat2 element2
5 item6 cat2 element2
>>>
CodePudding user response:
You can use pd.Series.map
:
df["Element"] = df["Category"].map(df2.set_index("CategoryName")["Element"])
print (df)
ItemName Category Element
0 item1 cat1 element1
1 item2 cat1 element1
2 item3 cat1 element1
3 item4 cat1 element1
4 item5 cat2 element2
5 item6 cat2 element2