I am trying to assign the dictionary key to a string column so the strings will have more consistent formats. If the string cannot be found in the dict key, we can either remove it or assign it as NaN. I tried using map(keys())
but it's not working. How can I do this effectively?
fruit_dict = {
"Apple": ["Apple", "apple", "apple_cake"],
"Watermelon": ["Watermelon", "water_melon"]
}
df = pd.DataFrame(
{
"ID": [1, 2],
"name": [
"apple, water_melon",
"apple_cake, cherry"
],
}
)
ID name
0 1 apple, water_melon
1 2 apple_cake, cherry
Expected output:
ID name
0 1 Apple, Watermelon
1 2 Apple
CodePudding user response:
The key here is to reverse keys and values from fruit_dict
to create a reverse mapping. Then explode your dataframe to have one word per row and reassemble your dataframe to get the original shape.
mapping = {v: k for k, l in fruit_dict.items() for v in l}
df['name'] = df['name'].str.split(', ').explode().map(mapping).dropna() \
.groupby(level=0).apply(', '.join)
print(df)
# Output:
ID name
0 1 Apple, Watermelon
1 2 Apple
Details:
>>> mapping
{'Apple': 'Apple',
'apple': 'Apple',
'apple_cake': 'Apple',
'Watermelon': 'Watermelon',
'water_melon': 'Watermelon'}