I have strings on similar format
hello this is an example [ a b c ]
hello this is another example [ cat bird dog elephant ]
Which I want to transform to
hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]
But I don't understand how to create a regexp pattern that removes any spaces next to the brackets and replaces any number of spaces between words/characters inside the brackets with a single ,
.
How would one create such a pattern?
My current attempt is a chain of regexp replacements.
m = re.sub('\[\s ','[',s)
m = re.sub('\s \]',']',m)
m = re.sub('\s ',' ',m)
m = re.sub(r'\s(?=[^\[\]]*])', ",", m)
But does anyone have any suggestion on how to make it more efficient or more clean?
CodePudding user response:
I didn't manage to do it with a fancy pattern, but how about this little workaround.
Just write a pattern that looks for everything in between the brackets, then deal with that string seperately. Like: split it by whitespace, filter the empty elements (from leading and trailing whitespaces at start and end) and join it back together as one string seperated by a comma.
That modified string you pass to re.sub
and replace it with everything between the brackets.
s1 = "hello this is an example [ a b c ]"
s2 = "hello this is another example [ cat bird dog elephant ]"
pattern = r"(?<=\[)(.*)(?=\])"
print(
re.sub(
pattern,
','.join(list(filter(None, re.split(r"\s ", re.search(pattern, s1).group(1)))))
, s1)
)
print(
re.sub(
pattern,
','.join(list(filter(None, re.split(r"\s ", re.search(pattern, s2).group(1)))))
, s2)
)
Output:
hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]
CodePudding user response:
In the first step, You can try to extract text between square brackets. Code should look more readable...
foo = 'hello this is another example [ cat bird dog elephant ]'
# get everything between [ and ]
reg_get_between_square_brackets= re.compile(r'\[(.*)\]')
str_to_replace = reg_get_between_square_brackets.findall(foo)[0]
# replace spaces with coma
new_string = re.sub('\s ', ',', str_to_replace.strip()) # strip to remove beginning/ending white space
print(foo.replace(str_to_replace, new_string))
Outputs:
hello this is another example [cat,bird,dog,elephant]
CodePudding user response:
Below is my solution, some comments added.
For the second part (replacing spaces between square brackets with comma, I would rather go for a split() and join() - regex solution is for sure slower.)
import re
str1 = 'hello this is an example [ a b c ]'
str2 = 'hello this is another example [ cat bird dog elephant ]'
# remove the SPACES near square brackets
str1 = re.sub(r'\[\s*(.*\S)\s*\]', r'[\1]', str1)
print(str1)
# replace the SPACES inside the square brackets until no replacement
old_str1 = ''
while old_str1 != str1:
old_str1 = str1
str1 = re.sub(r'\[(\S*)\s (.*)\]', r'[\1,\2]', str1, count=0)
print(str1)
str2 = re.sub(r'\[\s*(.*\S)\s*\]', r'[\1]', str2)
print(str2)
old_str2 = ''
while old_str2 != str2:
old_str2 = str2
str2 = re.sub(r'\[(\S*)\s (.*)\]', r'[\1,\2]', str2, count=0)
print(str2)
output
hello this is an example [a b c]
hello this is an example [a,b,c]
hello this is another example [cat bird dog elephant]
hello this is another example [cat,bird,dog,elephant]