Home > front end >  Replace strings column with dictionary key in Python
Replace strings column with dictionary key in Python

Time:12-04

I am trying to assign the dictionary key to a string column so the strings will have more consistent formats. If the string cannot be found in the dict key, we can either remove it or assign it as NaN. I tried using map(keys()) but it's not working. How can I do this effectively?

fruit_dict = {
  "Apple": ["Apple", "apple", "apple_cake"],
  "Watermelon": ["Watermelon", "water_melon"]
}

df = pd.DataFrame(
    {
        "ID": [1, 2],
        "name": [
            "apple, water_melon",
            "apple_cake, cherry"
        ],
    }
)

   ID                name
0   1  apple, water_melon
1   2  apple_cake, cherry

Expected output:

   ID               name
0   1  Apple, Watermelon
1   2              Apple

CodePudding user response:

The key here is to reverse keys and values from fruit_dict to create a reverse mapping. Then explode your dataframe to have one word per row and reassemble your dataframe to get the original shape.

mapping = {v: k for k, l in fruit_dict.items() for v in l}

df['name'] = df['name'].str.split(', ').explode().map(mapping).dropna() \
                       .groupby(level=0).apply(', '.join)
print(df)

# Output:
   ID               name
0   1  Apple, Watermelon
1   2              Apple

Details:

>>> mapping
{'Apple': 'Apple',
 'apple': 'Apple',
 'apple_cake': 'Apple',
 'Watermelon': 'Watermelon',
 'water_melon': 'Watermelon'}
  • Related