Home > other >  Create a conditional dict of strings to replace names
Create a conditional dict of strings to replace names

Time:02-17

I'm trying to create a conditional dictionary or list that converts the name of products in the list/dict from my df, for instance every item in the "product_milk" list is going to be changed to 'DAIRIES' in my df. my df contains simple items like WHOLE MILK BRAND X 1L, ETC.

ps: edited How do I automate np.where to a smarter way to do that for several lists?

    df = pd.DataFrame({'product_name': 
 ['WHOLE MILK TIGER 1L','LEITE INTEGRAL UHT COM TAMPA', 'LEITE SEMIDESNATADO UHT COM TAMPA', 'LEITE INTEGRAL UHT','BEER WHITE LION 350ML', 'WHISKY RED LABEL 1L']
})

product_milk = ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO']
product_beer = ['WHISKY', 'BEER']

df['a'] = np.where(df['product_name'].str.contains('|'.join(product_milk), na=False), 'DAIRIES', 0)
df['b'] = np.where(df['product_name'].str.contains('|'.join(product_beer), na=False), 'beer', df['a'])
print(df['b'][df['b'] != '0'])

Obs: I tried nested np, but still will be not smart, as I had to pass every name and list manually, like 'DAIRIES', 'BEER', etc..

Expected result: Only one series with this output.

NOTE: It's important to note that I already have this output, however the method/way that I'm using is poor, and that's what I want to change:

0    DAIRIES
1    DAIRIES
2    DAIRIES
3    DAIRIES
4       beer
5       beer
Name: b, dtype: object

CodePudding user response:

Frist I would keep all information as dictionary

replacements = {
    'DAIRIES': ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO'],
    'beer': ['WHISKY', 'BEER'],
}    

and then I could use for-loop to make it simpler.

I wooudl first create column b with default values (ie. empty string or string from product_name) and later use mask to replace values in this column

    mask = df['product_name'].str.contains('|'.join(product))
    df['b'][ mask ] = new_name

Full working example

import pandas as pd

df = pd.DataFrame({
    'product_name': ['WHOLE MILK TIGER 1L','LEITE INTEGRAL UHT COM TAMPA', 'LEITE SEMIDESNATADO UHT COM TAMPA', 'LEITE INTEGRAL UHT','BEER WHITE LION 350ML', 'WHISKY RED LABEL 1L']
})

replacements = {
    'DAIRIES': ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO'],
    'beer': ['WHISKY', 'BEER'],
}    

df['b'] = ''

for new_name, products in replacements.items():
    mask = df['product_name'].str.contains('|'.join(products))
    df['b'][ mask ] = new_name

print(df)

Result:

                        product_name        b
0                WHOLE MILK TIGER 1L  DAIRIES
1       LEITE INTEGRAL UHT COM TAMPA  DAIRIES
2  LEITE SEMIDESNATADO UHT COM TAMPA  DAIRIES
3                 LEITE INTEGRAL UHT  DAIRIES
4              BEER WHITE LION 350ML     beer
5                WHISKY RED LABEL 1L     beer

EDIT:

Other method is to use df['product_name'].replace(dictionary, regex=True) but it needs dictionary

dictionary = {
 '.*WHOLE MILK.*': 'DAIRIES'
 '.*LEITE INTEGRAL.*': 'DAIRIES',
 '.*LEITE DESNATADO.*': 'DAIRIES',
 '.*LEITE SEMIDESNATADO.*': 'DAIRIES',
 '.*WHISKY.*': 'beer',
 '.*BEER.*': 'beer',
}

or

dictionary = {
 '.*(WHOLE MILK|LEITE INTEGRAL|LEITE DESNATADO|LEITE SEMIDESNATADO).*': 'DAIRIES'
 '.*(WHISKY|BEER).*': 'beer',
}

Full working example

import pandas as pd

df = pd.DataFrame({
    'product_name': ['WHOLE MILK TIGER 1L','LEITE INTEGRAL UHT COM TAMPA', 'LEITE SEMIDESNATADO UHT COM TAMPA', 'LEITE INTEGRAL UHT','BEER WHITE LION 350ML', 'WHISKY RED LABEL 1L']
})

replacements = {
    'DAIRIES': ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO'],
    'beer': ['WHISKY', 'BEER'],
}    

dictionary = {}

# first type of dictionary
#for new_name, products in replacements.items():
#    for p in products:
#        dictionary[f'.*{p}.*'] = new_name

# second type of dictionary
for new_name, products in replacements.items():
    p = '|'.join(products)
    dictionary[f'.*({p}).*'] = new_name

df['b'] = df['product_name'].replace(dictionary, regex=True)
print(df)

print('--- dictionary ---')
import pprint
pprint.pprint(dictionary)

Result:

                        product_name        b
0                WHOLE MILK TIGER 1L  DAIRIES
1       LEITE INTEGRAL UHT COM TAMPA  DAIRIES
2  LEITE SEMIDESNATADO UHT COM TAMPA  DAIRIES
3                 LEITE INTEGRAL UHT  DAIRIES
4              BEER WHITE LION 350ML     beer
5                WHISKY RED LABEL 1L     beer
--- dictionary ---
{'.*(WHISKY|BEER).*': 'beer',
 '.*(WHOLE MILK|LEITE INTEGRAL|LEITE DESNATADO|LEITE SEMIDESNATADO).*': 'DAIRIES'}

  • Related