Home > Mobile >  getting same regex groups inside a block of text
getting same regex groups inside a block of text

Time:11-23

I trying to write a pattern to get each CPNJ group inside a this block of text, but the condition is that, is needed starts with executados: and ends with a CNPJ group. But, my pattern always get the last group, I don't know what I should do for it's works.

The answer regex101

pattern: (?:executados\:)[\p{L}\s\D\d] CNPJ\W (?P<cnpj>\d \.\d \.\d \/\d -\d )

string to test:

Dados dos executados:
1. FOO TEST STRING LTDA., CNPJ: 88.888.888/8888-88,
2. ANOTHER TEST STRING LTDA LTDA LTDA - ME, CNPJ: 99.999.999/9999-99,
3. FOO TEST STRING LTDA., CPF: 999.999.999-99,
4. FOO TEST STRING LTDA., CPF: 999.999.999-99.
Como medida de economia e celeridade processuais, atribuo a

I would to get the values {'cnpj': ['88.888.888/8888-88', '99.999.999/9999-99']}, this way is getting just the last.

CodePudding user response:

You can use PyPi regex module with the regex like

(?s)(?<=executados:.*?)CNPJ\W (\d \.\d \.\d /\d -\d )

See the regex demo.

Here is the Python demo:

import regex
text = """Dados dos executados:
1. FOO TEST STRING LTDA., CNPJ: 99.999.999/9999-99,
2. ANOTHER TEST STRING LTDA LTDA LTDA - ME, CNPJ: 99.999.999/9999-99,
3. FOO TEST STRING LTDA., CPF: 999.999.999-99,
4. FOO TEST STRING LTDA., CPF: 999.999.999-99.
Como medida de economia e celeridade processuais, atribuo a"""
print( regex.findall(r'(?s)(?<=executados:.*?)CNPJ\W (\d \.\d \.\d /\d -\d )', text) )

yielding

['99.999.999/9999-99', '99.999.999/9999-99']

The regex matches

  • (?s) - regex.DOTALL, enables . to match line break chars
  • (?<=executados:.*?) - right before the current location, there must be executados: and then any zero or more chars
  • CNPJ - a fixed string
  • \W - one or more non-word chars
  • (\d \.\d \.\d /\d -\d ) - the return value of regex.findall, Group 1: one or more digits and a . twice, then one or more digits, /, one or more digits, -` and one or more digits.
  • Related