I have this scenario. I’m in the process of learning. I'm cleaning a dataset. Now I have a problem There are a lot of rows that have this issue
I have the key but not the name product. I have the name product but not the key.
prod_key product
0 21.0 NaN
1 21.0 NaN
2 0.0 metal
3 35.0 NaN
4 22.0 NaN
5 0.0 wood
I know that the key of metal is 24 and the key of wood is 25 The product name that belongs to key 21 is plastic and the product name that belongs to key 22 is paper
There are hundreds of rows with the same situation. So, rename each and everyone of them will take me a lot of time.
I created a dictionary and then I used the .map() method but I’m still unable to ‘merge’ or you can say ‘mix’ the missing values in both columns without removing the other column value.
Thank you
CodePudding user response:
You can create an extra dataframe and do merge two times
lst = [
['metal', 24],
['wood', 25],
['plastic', 21],
['paper', 22]
]
df2 = pd.DataFrame(lst, columns=['name', 'key'])
df1['product'].update(df1.merge(df2, left_on='prod_key', right_on='key', how='left')['name'])
df1['prod_key'].update(df1.merge(df2, left_on='product', right_on='name', how='left')['key'])
print(df2)
name key
0 metal 24
1 wood 25
2 plastic 21
3 paper 22
print(df1)
prod_key product
0 21.0 plastic
1 21.0 plastic
2 24.0 metal
3 35.0 NaN
4 22.0 paper
5 25.0 wood