Home > Mobile >  how to “extract” values from a column, in which values are found in a dictionary
how to “extract” values from a column, in which values are found in a dictionary

Time:07-08

I have a dataframe df1:

ID item
11111 chair
11112 desk¥blue chair
11113 bed¥pen¥cable
11114 mug¥old sofa¥toy
11115 old mug¥wine glass
11116 blue chair¥old mug

etc.

and a large dictionary item_dict (from csv) which categorize some but not all items into categories:

item category
chair furniture
blue chair furniture
mug tableware
old mug tableware
wine glass tableware

etc.

The goal is to make a new column in df1 of all category of interest for items, if any, in each row:

ID item category
11111 chair furniture
11112 desk¥blue chair furniture
11113 bed¥pen¥cable
11114 mug¥old sofa¥toy mug
11115 old mug¥wine glass mug
11116 blue chair¥old mug furniture, mug

I’ve tried

df1[‘category’] = df1[‘item’].replace(item_dict)

which will only replace values that only have a single item;

adding regex=True would allow multi-item entries to be replaced but would turn desk¥blue chair into desk¥blue furniture bc a shorter one existed.

I know replace isn’t the best option since I want the new column to only contain categories of interest without other item names, but am unsure of what to use.

CodePudding user response:

You can try this:

# !pip install -U swifter  #### only in case of very large dataframe
import swifter


delimiter = '¥'
df1['category'] = df1['item'].swifter.apply(
    lambda _: delimiter.join([item_dict.get(el, '') for el in _.split(delimiter)])
)
  • Related