I have a string with several spaces followed by commas in a pandas column. These are how the strings are organized.
original_string = "okay, , , , humans"
I want to remove the spaces and the subsequent commas so that the string will be:
goodstring = "okay,humans"
But when I use this regex pattern: [\s,]
what I get is different. I get
badstring = "okayhumans"
.
It removes the comma after okay but I want it to be like in goodstring. How can I do that?
CodePudding user response:
Replace:
[\s,]*,[\s,]*
With:
,
See an online demo
[\s,]*
- 0 leading whitespace-characters or comma;,
- A literal comma (ensure we don't replace a single space);[\s,]*
- 0 trainling whitespace-characters or comma.
In Pandas, this would translate to something like:
df[<YourColumn>].str.replace('[\s,]*,[\s,]*', ',', regex=True)
CodePudding user response:
You have two issues with your code:
- Since
[\s,]
matches any combination of spaces and commas (e.g. single comma,
) you should not remove the match but replace it with','
[\s,]
matches any combination of spaces and commas, e.g. just a space' '
; it is not what we are looking for, we must be sure that at least one comma is present in the match.
Code:
text = 'okay, , ,,,, humans! A,B,C'
result = re.sub(r'\s*,[\s,]*', ',', text);
Pattern:
\s* - zero or more (leading) whitespaces
, - comma (we must be sure that we have at least one comma in a match)
[\s,]* - arbitrary combination of spaces and commas
CodePudding user response:
Please try this
re.sub('[,\s ,] ',',',original_string)
you want to replace ",[space]," with ",".
CodePudding user response:
You could use substitution:
import re
pattern = r'[\s,] '
original_string = "okay, , , , humans"
re.sub(r'[\s,] ', ',', original_string)