I have the following regex pattern word-word, so
r'\w \-\w
I would like to replace it with
r'\w \s\-\s\w
Example: I would like to change
hello-friends to hello - friends
I have tried the following with no success
df['mytextcolumn'].str.replace(r'(\\w )(\\-)(\\w )',r'(\\w )(\\s)(\\-)(\\s)(\\w )')
also tried with re.sub
re.sub(r'\\w \\-\\w ',r'\\w \\s\\-\\s\\w ','hello-friends')
but I still get back hello-friends, not hello - friends
I also checked my regex with an online regex matcher for python, and it picks up the patterns correctly, so I am confused why I am unable to replace it within my script.
CodePudding user response:
You can not use a new pattern in the replacement. Instead you can use 2 capture groups in the initial pattern, and use \1 - \2
in the replacement.
You can capture -
also in a group, but as it is a single character that you are literally matching you can also just use that in the replacement.
(\w )-(\w )
See a regex demo
df['mytextcolumn'] = df['mytextcolumn'].str.replace(r'(\w )-(\w )',r'\1 - \2', regex=True)
print(df)
Output
mytextcolumn
0 hello - friends