Home > Software engineering >  Removing only specified special characters from a column
Removing only specified special characters from a column

Time:07-20

I have 3 columns in my data frame, I am suppose to remove only those special character from a column that are mentioned below:

,.-[]={}/?,.<>()&^%$#@!;~`*

I have tried the below code but it's not working fine

regex = re.compile('[,.-=[]{}\/?,.<>()*&^%$#@!;~`]')
s=[]

for i in range(len(df1)):
    L = df1.loc[i,'Vendor Name']
    s.append(regex.sub('', L))
   

df1['Vendor Name']=s

This code is not removing the specified special characters, I'm not able to find out where the problem resides.

CodePudding user response:

You can use

df1['Vendor Name'] = df1['Vendor Name'].str.replace(r'[][,.={}\\/?,.<>()*&^%$#@!;~`-] ', '', regex=True)

See the regex demo.

Note:

  • ] does not have to be escaped when at the start of the character class, in any other place inside a character class, it must be escaped
  • - is at the end of the character class and again does not have to be escaped, it must be escaped if used in between other chars in the character class
  • \ must always be escaped in a regex pattern
  • Series.str.replace is more efficient than apply with re.sub in the loop
  • Use the raw string literal, r'...', to define your regexes.
  • Related