I am struggling to find the solution to this simple regex problem. I want to match a pattern but then only replace part of the pattern. I have tried a few things like using non-capturing groups. Please see my simple example below demonstrating this problem. In this I only want to remove the full stop after the word 'text' but no other full stops. So matching the pattern 'text.' but only removing the '.'. How is the done please? What is the simplest way? I would like a method that works with more complicated strings.
Example
df = pd.DataFrame({'col1': ['.test', 'test.', 'test,.']})
df['col1'] = df['col1'].str.replace(r"(?:test)(\.)", "", regex=True)
Current output
0 .test
1
2 .
Required out
0 .test
1 test
2 test,.
CodePudding user response:
Use a lookbehind ((?<=…)
), not a non capturing group ((?:…)
). Also, you need to escape the dot (\.
), if you want to match a literal dot and not any character.
df['col1'] = df['col1'].str.replace(r"(?<=test)(\.)", "", regex=True)
NB. The parentheses around the \.
are not required as you don't need to capture anything here.
output:
col1
0 .test
1 test
2 test,.
Other option, use a capturing group for test
and restore it:
df['col1'] = df['col1'].str.replace(r"(test)\.", r"\1", regex=True)