I am doing some pre processing for a data set on one particular column 'Title' I have already removed numbers and punctuation. But also want to remove measurements as well. The measurements are not in a separate column, they're in the title column.
#Load data set
df = pd.read_csv (r'example')
#df = pd.read_csv (r'example)
# remove numbers and punctuation
df['Title'] = df['Title'].str.replace(r'[^\w\s] ', '')
df['Title'] = df['Title'].str.replace('\d ', '')
print (df['Title'])
CodePudding user response:
df['Title'] = df['Title'].str.replace(r'\sg$|\skg$|\sml$', '')
as an example. or more generally removing the last word will amount to:
df['Title'] = df['Title'].str.replace(r'\s[a-z] $', '')
CodePudding user response:
You can use regex again.
df['Title'] = df['Title'].str.replace('xg$|g$|kg$|ml$', '')
The dollar sign acts as an anchor for the end of the string.