I parsed some texts from web with multiple useless strings with certain pattern as demonstrated below.
Some Text1adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_9")});Some Text2adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_10")});Some Text3adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_11")});Some Text4
I would like to replace the string with the substring "function(ads {ads.prime("mid_leaderboard_rectangle_%d")});"
to empty space. How can I do that with str.replace or regular expression? The expected output should be something like:
Some Text1 Some Text2 Some Text3 Some Text4
I have tried str.replace("function(ads {ads.prime("mid_leaderboard_rectangle_%d")});", " ")
but it won't work.
CodePudding user response:
I didn't read carefully and started to solve it a bit different (I guess in that case a bit long-winded) but maybe it helps anyway.
text = 'Some Text1adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_9")});Some Text2adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_10")});Some Text3adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_11")});Some Text4'
pattern = re.compile(r"([\w\s] (?:\d))")
text_list = text.split(';')
result = []
for elem in text_list:
m = re.match(pattern,elem)
result.append(m.group(1))
output = '; '.join(result) # or ''.join(result) for no delim
print(output)
'Some Text1; Some Text2; Some Text3; Some Text4'
And for your attempt with replace. I chose re.sub
to solve it.
output = re.sub(r"(?<=Text\d)(.*?)(?=;)", " ", text)
print(output)
'Some Text1 ;Some Text2 ;Some Text3 ;Some Text4'
In case you don't want delimiter:
output2 = re.sub(r"(?<=Text\d)(.*?;)(?=Some)", " ", text)
print(output2)
'Some Text1 Some Text2 Some Text3 Some Text4'
UPDATE: For the extra question from the comments: We need to split by whitespace but make sure that it doesn't split at every whitespace. This one will only split if there is a number followed by a whitespace followed by a number with a dot.
text = 'dummytext1 1. dummytext2 2. dummytext3 3. dummytext4'
output3 = re.split(r"(?<=\d)\s(?=\d\.)", text)
print(output3)
['dummytext1', '1. dummytext2', '2. dummytext3', '3. dummytext4']
CodePudding user response:
You could match for example a piece of the text specific enough to get the right match, and for the replacement, you can use 2 capture groups.
({ads\.prime\("mid_leaderboard_rectangle_)\d ("\)})
Explanation
({ads\.prime\("mid_leaderboard_rectangle_)
Capture group 1, match the part before the digits, and escape the dot and opening parenthesis:\d
Match 1 digits (which are to be replaced)("\)})
Capture group 2, match")}
See a regex demo and a Python demo.
Example code
import re
regex = r'({ads\.prime\("mid_leaderboard_rectangle_)\d ("\)})'
s = 'Some Text1adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_9")});Some Text2adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_10")});Some Text3adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_11")});Some Text4'
print(re.sub(regex, r"\1%d\2", s))
Output
Some Text1adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_%d")});Some Text2adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_%d")});Some Text3adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_%d")});Some Text4