I'm trying to create a conditional dictionary or list that converts the name of products in the list/dict from my df, for instance every item in the "product_milk" list is going to be changed to 'DAIRIES' in my df. my df contains simple items like WHOLE MILK BRAND X 1L, ETC.
ps: edited How do I automate np.where to a smarter way to do that for several lists?
df = pd.DataFrame({'product_name':
['WHOLE MILK TIGER 1L','LEITE INTEGRAL UHT COM TAMPA', 'LEITE SEMIDESNATADO UHT COM TAMPA', 'LEITE INTEGRAL UHT','BEER WHITE LION 350ML', 'WHISKY RED LABEL 1L']
})
product_milk = ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO']
product_beer = ['WHISKY', 'BEER']
df['a'] = np.where(df['product_name'].str.contains('|'.join(product_milk), na=False), 'DAIRIES', 0)
df['b'] = np.where(df['product_name'].str.contains('|'.join(product_beer), na=False), 'beer', df['a'])
print(df['b'][df['b'] != '0'])
Obs: I tried nested np, but still will be not smart, as I had to pass every name and list manually, like 'DAIRIES', 'BEER', etc..
Expected result: Only one series with this output.
NOTE: It's important to note that I already have this output, however the method/way that I'm using is poor, and that's what I want to change:
0 DAIRIES
1 DAIRIES
2 DAIRIES
3 DAIRIES
4 beer
5 beer
Name: b, dtype: object
CodePudding user response:
Frist I would keep all information as dictionary
replacements = {
'DAIRIES': ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO'],
'beer': ['WHISKY', 'BEER'],
}
and then I could use for
-loop to make it simpler.
I wooudl first create column b
with default values (ie. empty string or string from product_name
) and later use mask
to replace values in this column
mask = df['product_name'].str.contains('|'.join(product))
df['b'][ mask ] = new_name
Full working example
import pandas as pd
df = pd.DataFrame({
'product_name': ['WHOLE MILK TIGER 1L','LEITE INTEGRAL UHT COM TAMPA', 'LEITE SEMIDESNATADO UHT COM TAMPA', 'LEITE INTEGRAL UHT','BEER WHITE LION 350ML', 'WHISKY RED LABEL 1L']
})
replacements = {
'DAIRIES': ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO'],
'beer': ['WHISKY', 'BEER'],
}
df['b'] = ''
for new_name, products in replacements.items():
mask = df['product_name'].str.contains('|'.join(products))
df['b'][ mask ] = new_name
print(df)
Result:
product_name b
0 WHOLE MILK TIGER 1L DAIRIES
1 LEITE INTEGRAL UHT COM TAMPA DAIRIES
2 LEITE SEMIDESNATADO UHT COM TAMPA DAIRIES
3 LEITE INTEGRAL UHT DAIRIES
4 BEER WHITE LION 350ML beer
5 WHISKY RED LABEL 1L beer
EDIT:
Other method is to use df['product_name'].replace(dictionary, regex=True)
but it needs dictionary
dictionary = {
'.*WHOLE MILK.*': 'DAIRIES'
'.*LEITE INTEGRAL.*': 'DAIRIES',
'.*LEITE DESNATADO.*': 'DAIRIES',
'.*LEITE SEMIDESNATADO.*': 'DAIRIES',
'.*WHISKY.*': 'beer',
'.*BEER.*': 'beer',
}
or
dictionary = {
'.*(WHOLE MILK|LEITE INTEGRAL|LEITE DESNATADO|LEITE SEMIDESNATADO).*': 'DAIRIES'
'.*(WHISKY|BEER).*': 'beer',
}
Full working example
import pandas as pd
df = pd.DataFrame({
'product_name': ['WHOLE MILK TIGER 1L','LEITE INTEGRAL UHT COM TAMPA', 'LEITE SEMIDESNATADO UHT COM TAMPA', 'LEITE INTEGRAL UHT','BEER WHITE LION 350ML', 'WHISKY RED LABEL 1L']
})
replacements = {
'DAIRIES': ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO'],
'beer': ['WHISKY', 'BEER'],
}
dictionary = {}
# first type of dictionary
#for new_name, products in replacements.items():
# for p in products:
# dictionary[f'.*{p}.*'] = new_name
# second type of dictionary
for new_name, products in replacements.items():
p = '|'.join(products)
dictionary[f'.*({p}).*'] = new_name
df['b'] = df['product_name'].replace(dictionary, regex=True)
print(df)
print('--- dictionary ---')
import pprint
pprint.pprint(dictionary)
Result:
product_name b
0 WHOLE MILK TIGER 1L DAIRIES
1 LEITE INTEGRAL UHT COM TAMPA DAIRIES
2 LEITE SEMIDESNATADO UHT COM TAMPA DAIRIES
3 LEITE INTEGRAL UHT DAIRIES
4 BEER WHITE LION 350ML beer
5 WHISKY RED LABEL 1L beer
--- dictionary ---
{'.*(WHISKY|BEER).*': 'beer',
'.*(WHOLE MILK|LEITE INTEGRAL|LEITE DESNATADO|LEITE SEMIDESNATADO).*': 'DAIRIES'}