Home > Mobile >  Create Category with the keys of a dictionary where a any string from the values as list appears any
Create Category with the keys of a dictionary where a any string from the values as list appears any

Time:10-27

Say you have a dataFrame that looks like this:

name             Description
    ABC          'Verdansk was the best'
    EDF          'Pro Clubs with the lads'
    LMN          'Being Druid was the best'
    RST          'The game of FIFA23'
    XYZ          'Switch on for Bio lab drop' 
     ...            ....

and a dictionary like this:

categories = {
            "CALL_OF_DUTY":["Verdansk","Bio Lab"], 
            "WORLD_OF_WARCRAFT":["Druid","Rogue"], 
            "FIFA": ["Pro Clubs", "FIFA"]
             }

What I want to return the keys as the category where the values from dictionary is found anywhere in the description column from the pandas dataFrame.

The output should be:

name             Description                             category
    ABC          'Verdansk was the best'                 'CALL_OF_DUTY'
    EDF          'Pro Clubs with the lads'               'FIFA' 
    LMN          'Being Druid was the best'              'WORLD_OF_WARCRAFT'
    RST          'The game of FIFA23'                    'FIFA'
    XYZ          'Switch on for Bio lab drop'            'CALL_OF_DUTY'
     ...            ....

The solution I have so far is below but doesn't work for keys but if I want to return the values from the dict then that works. I have spent over a week trying to solve this and would like some help and if possible an explanation so I can learn from it.

df['Category'] = (df['Description'].str.extract(fr"\b({'|'.join(categories.values())})\b", re.IGNORECASE)[0].map(categories))

CodePudding user response:

You need reverse the categories. Since you want to match FIFA23 with FIFA, you don't need \b.

categories = {v.lower():k for k, lst in categories.items() for v in lst}

df['Category'] = (df['Description'].str.extract(fr"({'|'.join(categories)})", re.IGNORECASE)
                  [0].str.lower().map(categories))
print(df)

  name                   Description           Category
0  ABC       'Verdansk was the best'       CALL_OF_DUTY
1  EDF     'Pro Clubs with the lads'               FIFA
2  LMN    'Being Druid was the best'  WORLD_OF_WARCRAFT
3  RST          'The game of FIFA23'               FIFA
4  XYZ  'Switch on for Bio lab drop'       CALL_OF_DUTY
  • Related