Home > Software design >  Removing substrings from a Data Frame
Removing substrings from a Data Frame

Time:09-16

I used the following code

remove_words=['Conference Call - Final.rtf','Conference Call - F(2).rtf','Final(2).rtf']
pat= '|'.join(remove_words)
pat
df['title'] = df['conference_name'].str.replace(pat,'')

but my result was enter image description here my code successfully replaced [Conference Call - Final.rtf] but was not able to replace [Conference Call - F(2).rtf][Final(2).rtf] my desired output should replace all the substrings which are passed.

CodePudding user response:

You can use re module to delete specific strings such as :

re.sub("{Conference Call - Final.rtf}",'',df['conference_name'][0])

CodePudding user response:

As Charles Duffy mentioned in the comments, parentheses have special meaning in a regular expression (signifies a capturing group), and you're using the str.replace method with its default argument regex=True. The (2) in your pattern hence interferes with the regex search and replace, and you would have to escape these symbols to signify that you're using the parentheses literally, instead of as a capturing group.

Let's do:

remove_words=['Conference Call - Final.rtf','Conference Call - F(2).rtf','Final(2).rtf']
pat = '|'.join(re.escape(w) for w in remove_words)

df['title'] = df['conference_name'].str.replace(pat, '')
  • Related