I trying to write a pattern to get each CPNJ group inside a this block of text, but the condition is that, is needed starts with executados:
and ends with a CNPJ group. But, my pattern always get the last group, I don't know what I should do for it's works.
pattern: (?:executados\:)[\p{L}\s\D\d] CNPJ\W (?P<cnpj>\d \.\d \.\d \/\d -\d )
string to test:
Dados dos executados:
1. FOO TEST STRING LTDA., CNPJ: 88.888.888/8888-88,
2. ANOTHER TEST STRING LTDA LTDA LTDA - ME, CNPJ: 99.999.999/9999-99,
3. FOO TEST STRING LTDA., CPF: 999.999.999-99,
4. FOO TEST STRING LTDA., CPF: 999.999.999-99.
Como medida de economia e celeridade processuais, atribuo a
I would to get the values {'cnpj': ['88.888.888/8888-88', '99.999.999/9999-99']}
, this way is getting just the last.
CodePudding user response:
You can use PyPi regex module with the regex like
(?s)(?<=executados:.*?)CNPJ\W (\d \.\d \.\d /\d -\d )
See the regex demo.
Here is the Python demo:
import regex
text = """Dados dos executados:
1. FOO TEST STRING LTDA., CNPJ: 99.999.999/9999-99,
2. ANOTHER TEST STRING LTDA LTDA LTDA - ME, CNPJ: 99.999.999/9999-99,
3. FOO TEST STRING LTDA., CPF: 999.999.999-99,
4. FOO TEST STRING LTDA., CPF: 999.999.999-99.
Como medida de economia e celeridade processuais, atribuo a"""
print( regex.findall(r'(?s)(?<=executados:.*?)CNPJ\W (\d \.\d \.\d /\d -\d )', text) )
yielding
['99.999.999/9999-99', '99.999.999/9999-99']
The regex matches
(?s)
-regex.DOTALL
, enables.
to match line break chars(?<=executados:.*?)
- right before the current location, there must beexecutados:
and then any zero or more charsCNPJ
- a fixed string\W
- one or more non-word chars(\d \.\d \.\d /\d -\d )
- the return value ofregex.findall
, Group 1: one or more digits and a.
twice, then one or more digits,/
,one or more digits,
-` and one or more digits.