I'm trying to write a regex to match the entity types from México.
It should match:
- S.A.
- SA
- S DE RL DE CV
- S. DE R.L. DE C.V.
- S.A. DE C.V.
- S.A.P.I DE C.V.
- SAPI DE CV
- SA DE CV
- S.A.B DE CV
- SA CV
I'm kinda stuck in the names that have "de Rablabla" and "de Caldsjsd" in the middle because my regex is matching the "de R" or "de C" and doesn't match "SA CV", "S.A.", "SA".
regex:
( s[^a-zA-Z]*((a[^a-zA-Z]*)|(a[^a-zA-Z]*p[^a-zA-Z]*i[^a-zA-Z]*)|(p[^a-zA-Z]*r[^a-zA-Z]*)|(a( |\.)*b( |\.)*)|(c( |\.)*))*){0,}(de ((r[^a-zA-Z]*(l[^a-zA-Z]*)*)|(c[^a-zA-Z]*(v[^a-zA-Z]*)*))){1,}
Is it possible to this regex and am I doing this the right way?
CodePudding user response:
You could try it in regex101
My try:
(^S\.?\s?(A\.?\s?)?B?(P\.?I\.?)?(\sDE)?(\sR\.?L\.?)?(\sDE)?(\sC\.?V\.?)?)
https://regex101.com/r/wkZVWw/1
CodePudding user response:
You could write a pattern with alternations allowing all the variations.
^S(?:\.?A\.?P\.?I|\.?(?:A\.?)?|\.A\.B)?(?:(?: DE R\.?L\.?)?(?: DE)? C\.?V\.?)?$
In parts, the pattern matches:
^
Start of stringS
Match literally (is in all the examples)(?:
Non capture group\.?A\.?P\.?I
Match A and P and I with optional dots|
Or\.?(?:A\.?)?
Match optional dot and optional A and dot|
Or\.A\.B
MatchA.B.
)?
Close the non capture group and make it optinal(?:
Non capture group(?: DE R\.?L\.?)?
Optionally matchDE RL
with optional dots for R and L(?: DE)?
Optionally matchDE
C\.?V\.?
Match CV with optional dots
)?
Close the non capture group and make it optional$
End of string