Home > Mobile >  Python regex match a pattern for multiple times
Python regex match a pattern for multiple times

Time:06-28

I've got a list of strings.

input=['XX=BB|3|3|1|1|PLP|KLWE|9999|9999', 'XX=BB|3|3|1|1|2|PLP|KPOK|99999|99999', '999|999|999|9999|999', ....]

This type '999|999|999|9999|999' remains unchanged.

I need to replace 9999|9999 with 12|21

I write this (?<=BB\|\d\|\d\|\d\|\d\|\S{3}\|\S{4}\|)9{2,9}\|9{2,9} to match 999|999. However, there are 4 to 6 \|\d in the middle. So how to match |d this pattern for multiple times.

Desired result:

['XX=BB|3|3|1|1|PLP|KLWE|12|21', 'XX=BB|3|3|1|1|2|PLP|KPOK|12|21', '999|999|999|9999|999'...]

thanks

CodePudding user response:

I would just use re.sub here and search for the pattern \b9{2,9}\|9{2,9}\b:

inp = ["XX=BB|3|3|1|1|PLP|KLWE|9999|9999" "XX=BB|3|3|1|1|2|PLP|KPOK|99999|99999"]
output = [re.sub(r'\b9{2,9}\|9{2,9}\b', '12|21', i) for i in inp]
print(output)

# ['XX=BB|3|3|1|1|PLP|KLWE|12|21', 'XX=BB|3|3|1|1|2|PLP|KPOK|12|21']

CodePudding user response:

You can use

re.sub(r'(BB(?:\|\d){4,6}\|[^\s|]{3}\|[^\s|]{4}\|)9{2,9}\|9{2,9}(?!\d)', r'\g<1>12|21', text)

See the regex demo.

Details:

  • (BB(?:\|\d){4,6}\|[^\s|]{3}\|[^\s|]{4}\|) - Capturing group 1:
    • BB - a BB string
    • (?:\|\d){4,6} - four, five or six repetitions of | and any digit sequence
    • \| - a | char
    • [^\s|]{3} - three chars other than whitespace and a pipe
    • \|[^\s|]{4}\| - a |, four chars other than whitespace and a pipe, and then a pipe char
  • 9{2,9}\|9{2,9} - two to nine 9 chars, | and again two to nine 9 chars...
  • (?!\d) - not followed with another digit (note you may remove this if you do not need to check for the digit boundary here. You may also use (?![^|]) instead if you need to check if there is a | char or end of string immediately on the right).

The \g<1>12|21 replacement includes an unambiguous backreference to Group 1 (\g<1>) and a 12|21 substring appended to it.

See the Python demo:

import re
texts=['XX=BB|3|3|1|1|PLP|KLWE|9999|9999', 'XX=BB|3|3|1|1|2|PLP|KPOK|99999|99999', '999|999|999|9999|999']
pattern = r'(BB(?:\|\d){4,6}\|[^\s|]{3}\|[^\s|]{4}\|)9{2,9}\|9{2,9}(?!\d)'
repl = r'\g<1>12|21'
for text in texts:
    print( re.sub(pattern, repl, text) )

Output:

XX=BB|3|3|1|1|PLP|KLWE|12|21
XX=BB|3|3|1|1|2|PLP|KPOK|12|21
999|999|999|9999|999
  • Related