Home > Software engineering >  Can you remove measurements - g/kg/ml etc from a Pandas Dataframe?
Can you remove measurements - g/kg/ml etc from a Pandas Dataframe?

Time:12-31

I am doing some pre processing for a data set on one particular column 'Title' I have already removed numbers and punctuation. But also want to remove measurements as well. The measurements are not in a separate column, they're in the title column.

            #Load data set
df = pd.read_csv (r'example')
#df = pd.read_csv (r'example)


# remove numbers and punctuation 
df['Title'] = df['Title'].str.replace(r'[^\w\s] ', '')
df['Title'] = df['Title'].str.replace('\d ', '')
print (df['Title'])    

Return and the dataset column

CodePudding user response:

df['Title'] = df['Title'].str.replace(r'\sg$|\skg$|\sml$', '')

as an example. or more generally removing the last word will amount to:

df['Title'] = df['Title'].str.replace(r'\s[a-z] $', '')

CodePudding user response:

You can use regex again.

df['Title'] = df['Title'].str.replace('xg$|g$|kg$|ml$', '')

The dollar sign acts as an anchor for the end of the string.

  • Related