I have 3 columns in my data frame, I am suppose to remove only those special character from a column that are mentioned below:
,.-[]={}/?,.<>()&^%$#@!;~`*
I have tried the below code but it's not working fine
regex = re.compile('[,.-=[]{}\/?,.<>()*&^%$#@!;~`]')
s=[]
for i in range(len(df1)):
L = df1.loc[i,'Vendor Name']
s.append(regex.sub('', L))
df1['Vendor Name']=s
This code is not removing the specified special characters, I'm not able to find out where the problem resides.
CodePudding user response:
You can use
df1['Vendor Name'] = df1['Vendor Name'].str.replace(r'[][,.={}\\/?,.<>()*&^%$#@!;~`-] ', '', regex=True)
See the regex demo.
Note:
]
does not have to be escaped when at the start of the character class, in any other place inside a character class, it must be escaped-
is at the end of the character class and again does not have to be escaped, it must be escaped if used in between other chars in the character class\
must always be escaped in a regex patternSeries.str.replace
is more efficient thanapply
withre.sub
in the loop- Use the raw string literal,
r'...'
, to define your regexes.