Home > Blockchain >  Regexp pattern to remove spaces next to brackets and replace any spaces between words/characters ins
Regexp pattern to remove spaces next to brackets and replace any spaces between words/characters ins

Time:11-08

I have strings on similar format

hello this is an example [ a b c ]

hello this is another example [ cat bird dog elephant ]

Which I want to transform to

hello this is an example [a,b,c]

hello this is another example [cat,bird,dog,elephant]

But I don't understand how to create a regexp pattern that removes any spaces next to the brackets and replaces any number of spaces between words/characters inside the brackets with a single ,.

How would one create such a pattern?

My current attempt is a chain of regexp replacements.

m = re.sub('\[\s ','[',s)
m = re.sub('\s \]',']',m)
m = re.sub('\s ',' ',m)
m = re.sub(r'\s(?=[^\[\]]*])', ",", m)

But does anyone have any suggestion on how to make it more efficient or more clean?

CodePudding user response:

I didn't manage to do it with a fancy pattern, but how about this little workaround. Just write a pattern that looks for everything in between the brackets, then deal with that string seperately. Like: split it by whitespace, filter the empty elements (from leading and trailing whitespaces at start and end) and join it back together as one string seperated by a comma. That modified string you pass to re.sub and replace it with everything between the brackets.

s1 = "hello this is an example [ a    b c ]"
s2 = "hello this is another example [ cat    bird dog elephant   ]"

pattern = r"(?<=\[)(.*)(?=\])"

print(
    re.sub(
        pattern, 
        ','.join(list(filter(None, re.split(r"\s ", re.search(pattern, s1).group(1)))))
        , s1)
)

print(
    re.sub(
        pattern, 
        ','.join(list(filter(None, re.split(r"\s ", re.search(pattern, s2).group(1)))))
        , s2)
)

Output:

hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]

CodePudding user response:

In the first step, You can try to extract text between square brackets. Code should look more readable...

foo = 'hello this is another example [ cat    bird dog elephant   ]'

# get everything between [ and ]
reg_get_between_square_brackets= re.compile(r'\[(.*)\]')
str_to_replace = reg_get_between_square_brackets.findall(foo)[0]

# replace spaces with coma
new_string = re.sub('\s ', ',', str_to_replace.strip())  # strip to remove beginning/ending white space
print(foo.replace(str_to_replace, new_string))  

Outputs:

hello this is another example [cat,bird,dog,elephant]

CodePudding user response:

Below is my solution, some comments added.

For the second part (replacing spaces between square brackets with comma, I would rather go for a split() and join() - regex solution is for sure slower.)

import re

str1 = 'hello this is an example [ a    b c ]'

str2 = 'hello this is another example [ cat    bird dog elephant   ]'

# remove the SPACES near square brackets
str1 =  re.sub(r'\[\s*(.*\S)\s*\]', r'[\1]', str1)
print(str1)
# replace the SPACES inside the square brackets until no replacement
old_str1 = ''
while old_str1 != str1:
    old_str1 = str1
    str1 =  re.sub(r'\[(\S*)\s (.*)\]', r'[\1,\2]', str1, count=0)
print(str1)


str2 =  re.sub(r'\[\s*(.*\S)\s*\]', r'[\1]', str2)
print(str2)
old_str2 = ''
while old_str2 != str2:
    old_str2 = str2
    str2 =  re.sub(r'\[(\S*)\s (.*)\]', r'[\1,\2]', str2, count=0)
print(str2)

output

hello this is an example [a    b c]
hello this is an example [a,b,c]
hello this is another example [cat    bird dog elephant]
hello this is another example [cat,bird,dog,elephant]
  • Related