I'm facing a lot of trouble on what seems to be a simple matter:
I have a column with some beverages names, but they are poluted with "12oz" and "Boxes". I want to get only the name of the beverages. Unfortunally, they are not typed in the same particular form, so i cant just [0:5] them. I know all the beverages names on the column, if that helps
Ex: Column name: WHISKY BALLANTINES12YO 12X1000 ML RESTAGE
Column Created based on the previous one: BALLANTINES
Thanks in advance,
EDIT Some other examples:
CHIVAS REGAL 12 ANOS 12X1L should be CHIVAS
VODKA ABSOLUT 12X1000ML should be ABSOLUT
CodePudding user response:
Just use replace statements and set regex to true, and replace with an empty string, like this:
df.replace('12oz', '', regex=True)
This is assuming you know what text you will have to replace.
CodePudding user response:
If you have the list of all the beverages, you can use pandas.Series.extract
:
import re
list_of_bvr= ["ballantines", "chivas", "absolut"]
df["Col1"]= df["Col1"].str.extract(f"({'|'.join(list_of_bvr)})", flags=re.I, expand=False)
# Output :
print(df)
Col1
0 BALLANTINES
1 CHIVAS
2 ABSOLUT