I'm trying to get this values - 10.547.889/0001-85
, 00.219.460/0001-05
separated by groups, but the condition is that the pattern need start with executada(s):
, can't be something like: r' - CNPJ:? (?P<cnpj>\d \.\d \.\d \/\d -\d )'
. So, the idea is start in executada(s)
and get this groups.
Currently, my pattern just get the first group, I don't know how to get all them.
I'm using Python 3.8.5 and regex lib(doesn't re).
text = """
Solicite-se ao BANCO CENTRAL, via protocolo digital - SISBACEN ,
o BLOQUEIO de créditos existentes até o limite de R$ 30.257,45 (trinta mil, duzentos e
cinquenta e sete reais e quarenta e cinco centavos) da(s) executada(s): J.HENRIQUE
GALVANI COMERCIO DE ROUPAS - ME - CNPJ 10.547.889/0001-85, Riane Confecções de
Roupas Ltda - ME - CNPJ: 00.219.460/0001-05, Jose Henrique Galvani - CPF: 234.846.406-34
e Heliane Leonel Raymundo Galvani - CPF: 813.460.347-53, porventura
existentes junto a instituições financeiras, incluindo cartões de crédito, agenciadores
de pagamento, administradores de consórcio."""
pattern = r'executad\w(?:\(s\))?\W (?:[\p{L}\s\-\.] CNPJ\W (?P<cnpj>\d \.\d \.\d \/\d -\d ),) '
for item in regex.finditer(pattern, text, flags=regex.I|regex.S):
print(item.groupdict())
{'cnpj': '00.219.460/0001-05'}
I was waiting for:
{'cnpj': '00.219.460/0001-05'}
{'cnpj': '10.547.889/0001-85'}
So, can someone help me with this trouble?
CodePudding user response:
Using the regex module, you could make use of the \G
anchor:
(?:executad\w(?:\(s\))?\W |\G(?!^)),?[\p{L}\s.-] CNPJ\W \K(?P<cnpj>\d \.\d \.\d /\d -\d )
In parts, the pattern matches:
(?:
Non capture groupexecutad\w
Matchexecutad
, a word char (which could also be ana
char if that is the only possibility)(?:\(s\))?\W
Optionally match(s)
and 1 non word chars|
Or\G(?!^)
Assert the current postion at the end of the previous match, but not at the start of the string
)
Close non capture group,?[\p{L}\s.-]
Match an optional,
and 1 times any letter, whitespace char,.
or-
CNPJ\W
MatchCNPJ
and 1 times non word chars\K
Clear the match buffer to forget what is matched so far(?P<cnpj>\d \.\d \.\d /\d -\d )
Named group cnpj, capture the desired format
For the example data, you can omit the regex.S
flag as \W
also matches a newline.
import regex
pattern = r"(?:executad\w(?:\(s\))?\W |\G(?!^)),?[\p{L}\s.-] CNPJ\W \K(?P<cnpj>\d \.\d \.\d /\d -\d )"
text = ("Solicite-se ao BANCO CENTRAL, via protocolo digital - SISBACEN ,\n"
"o BLOQUEIO de créditos existentes até o limite de R$ 30.257,45 (trinta mil, duzentos e\n"
"cinquenta e sete reais e quarenta e cinco centavos) da(s) executada(s): J.HENRIQUE\n"
"GALVANI COMERCIO DE ROUPAS - ME - CNPJ 10.547.889/0001-85, Riane Confecções de\n"
"Roupas Ltda - ME - CNPJ: 00.219.460/0001-05, Jose Henrique Galvani - CPF: 234.846.406-34\n"
"e Heliane Leonel Raymundo Galvani - CPF: 813.460.347-53, porventura\n"
"existentes junto a instituições financeiras, incluindo cartões de crédito, agenciadores\n"
"de pagamento, administradores de consórcio.")
for item in regex.finditer(pattern, text):
print(item.groupdict())
Output
{'cnpj': '10.547.889/0001-85'}
{'cnpj': '00.219.460/0001-05'}
CodePudding user response:
Check if this works for you:
text = """
Solicite-se ao BANCO CENTRAL, via protocolo digital - SISBACEN ,
o BLOQUEIO de créditos existentes até o limite de R$ 30.257,45 (trinta mil, duzentos e
cinquenta e sete reais e quarenta e cinco centavos) da(s) executada(s): J.HENRIQUE
GALVANI COMERCIO DE ROUPAS - ME - CNPJ 10.547.889/0001-85, Riane Confecções de
Roupas Ltda - ME - CNPJ: 00.219.460/0001-05, Jose Henrique Galvani - CPF: 234.846.406-34
e Heliane Leonel Raymundo Galvani - CPF: 813.460.347-53, porventura
existentes junto a instituições financeiras, incluindo cartões de crédito, agenciadores
de pagamento, administradores de consórcio."""
pattern = r'[0-9]{2}\.?[0-9]{3}\.?[0-9]{3}\/?[0-9]{4}\-?[0-9]{2}'
# cut text to start right after executada(s)
text = text.split("executada(s)")[1]
cnpjs = [{"cnpj": cnpj} for cnpj in regex.findall(pattern, text)]
print(cnpjs)