Regex match two characters following each other-CodePudding

I have a string with several spaces followed by commas in a pandas column. These are how the strings are organized.

original_string = "okay, , , , humans"

I want to remove the spaces and the subsequent commas so that the string will be:

goodstring = "okay,humans"

But when I use this regex pattern: [\s,] what I get is different. I get

badstring = "okayhumans".

It removes the comma after okay but I want it to be like in goodstring. How can I do that?

CodePudding user response：

Replace:

[\s,]*,[\s,]*

With:

See an online demo

[\s,]* - 0 leading whitespace-characters or comma;
, - A literal comma (ensure we don't replace a single space);
[\s,]* - 0 trainling whitespace-characters or comma.

In Pandas, this would translate to something like:

df[<YourColumn>].str.replace('[\s,]*,[\s,]*', ',', regex=True)

CodePudding user response：

You have two issues with your code:

Since [\s,] matches any combination of spaces and commas (e.g. single comma ,) you should not remove the match but replace it with ','
[\s,] matches any combination of spaces and commas, e.g. just a space ' '; it is not what we are looking for, we must be sure that at least one comma is present in the match.

Code:

text = 'okay, ,  ,,,, humans! A,B,C'

result = re.sub(r'\s*,[\s,]*', ',', text);

Pattern:

\s*    - zero or more (leading) whitespaces
,      - comma (we must be sure that we have at least one comma in a match)
[\s,]* - arbitrary combination of spaces and commas

CodePudding user response：

Please try this

re.sub('[,\s ,] ',',',original_string)

you want to replace ",[space]," with ",".

CodePudding user response：

You could use substitution:

import re

pattern = r'[\s,] '
original_string = "okay, , , , humans"
re.sub(r'[\s,] ', ',', original_string)