Home > database >  Replace the string of certain complex pattern with empty string
Replace the string of certain complex pattern with empty string

Time:04-16

I parsed some texts from web with multiple useless strings with certain pattern as demonstrated below.

Some Text1adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_9")});Some Text2adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_10")});Some Text3adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_11")});Some Text4

I would like to replace the string with the substring "function(ads {ads.prime("mid_leaderboard_rectangle_%d")});" to empty space. How can I do that with str.replace or regular expression? The expected output should be something like:

Some Text1  Some Text2  Some Text3  Some Text4

I have tried str.replace("function(ads {ads.prime("mid_leaderboard_rectangle_%d")});", " ") but it won't work.

CodePudding user response:

I didn't read carefully and started to solve it a bit different (I guess in that case a bit long-winded) but maybe it helps anyway.

text = 'Some Text1adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_9")});Some Text2adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_10")});Some Text3adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_11")});Some Text4'
pattern = re.compile(r"([\w\s] (?:\d))")

text_list = text.split(';')
result = []
for elem in text_list:
    m = re.match(pattern,elem)
    result.append(m.group(1))

output = '; '.join(result) # or ''.join(result) for no delim
print(output)
'Some Text1; Some Text2; Some Text3; Some Text4'

And for your attempt with replace. I chose re.sub to solve it.

output = re.sub(r"(?<=Text\d)(.*?)(?=;)", " ", text)
print(output)
'Some Text1 ;Some Text2 ;Some Text3 ;Some Text4'

In case you don't want delimiter:

output2 = re.sub(r"(?<=Text\d)(.*?;)(?=Some)", " ", text)
print(output2)
'Some Text1 Some Text2 Some Text3 Some Text4'

UPDATE: For the extra question from the comments: We need to split by whitespace but make sure that it doesn't split at every whitespace. This one will only split if there is a number followed by a whitespace followed by a number with a dot.

text = 'dummytext1 1. dummytext2 2. dummytext3 3. dummytext4'
output3 = re.split(r"(?<=\d)\s(?=\d\.)", text)
print(output3)
['dummytext1', '1. dummytext2', '2. dummytext3', '3. dummytext4']

CodePudding user response:

You could match for example a piece of the text specific enough to get the right match, and for the replacement, you can use 2 capture groups.

({ads\.prime\("mid_leaderboard_rectangle_)\d ("\)})

Explanation

  • ({ads\.prime\("mid_leaderboard_rectangle_) Capture group 1, match the part before the digits, and escape the dot and opening parenthesis:
  • \d Match 1 digits (which are to be replaced)
  • ("\)}) Capture group 2, match ")}

See a regex demo and a Python demo.

Example code

import re

regex = r'({ads\.prime\("mid_leaderboard_rectangle_)\d ("\)})'

s = 'Some Text1adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_9")});Some Text2adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_10")});Some Text3adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_11")});Some Text4'

print(re.sub(regex, r"\1%d\2", s))

Output

Some Text1adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_%d")});Some Text2adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_%d")});Some Text3adCommands.push(function(ads){ads.prime("mid_leaderboard_rectangle_%d")});Some Text4
  • Related