In our Python system, I'm trying to isolate the second part of a size to make sure i can save the values separately.
As i got data in tons of different ways i have to take a lot of scenarios into consideration! At the same time our system requires everything to be in group 1 to be identified correctly, which increases the complexity!
This is what i got so far:
(?<=[\/\-])\s*([A-Za-z] |\w ) ?(?!\d*\s*\)|\d*\)|\w*\))(?!\s*[\/\-] )
Examples
working
These are my examples working:
110/116
S/M
S / M
S/M(32-34)
110/116(10-12y)
110/116(S/M)
not working
However my regex only functions correctly on the above examples.
Following 7 are causing issues:
S/M / L /XL
S / M / L / XL
S/M / L/XL
S/M/L/XL
S/M/L/XL(30-32)
S/M / L/XL(30-32)
S/M / L / XL(30-32)
How can I capture those cases as in below table:
Case | Input | Expected capture in group 1 |
---|---|---|
1 | S/M / L /XL |
"L /XL" |
2 | S / M / L / XL |
"L / XL" |
3 | S/M / L/XL |
"L/XL" |
4 | S/M/L/XL |
"L/XL" |
5 | S/M/L/XL(30-32) |
"L/XL" |
6 | S/M / L/XL(30-32) |
"L/XL" |
7 | S/M / L / XL(30-32) |
"L / XL" |
Issue
How can I capture a "/"
in the middle including the whole part after (like /XL
) but without any following parentheses (like not the (30/32)
).
Example for S/M / L / XL(30-32)
I want to capture L / XL
only.
CodePudding user response:
You can use
(?<=[/-])\s*([A-Z] (?:\s*/\s*[A-Z] )?|\d )\b(?!\s*[/)-])
See the regex demo. Details:
(?<=[/-])
- a position immediately preceded with/
or-
\s*
- zero or more whitespaces([A-Z] (?:\s*/\s*[A-Z] )?|\d )
- Group 1: one or more uppercase letters, and then an optional sequence of a/
char enclosed with zero or more whitespaces and then one or more uppercase letters, or one or more digits\b
- a word boundary(?!\s*[/)-])
- immediately to the right of the current location, there can't be zero or more whitespaces and then either/
,)
or-
.