I have data in string format where somewhere within the string, a size is given (s, m or l). I want to extract the size, but the formatting is a bit all over the place and you can have (s), /s, S , and all kinds of variations for the small size. I figured as long as the letter denoting size does not have a letter to the left or right, it should be the size I'm looking for, so for example, "es", "set" etc. would be substrings where the size "s" is not to be found, while " s, ", "/s " are substrings where the size (small in this case) can be found.
I have no idea how to do that in Regex; I googled but did not found close matches to what I'm looking for. I'm also new-ish to regular expressions.
Examples:
"es", "setr", "qwfq qf", "lllll" Output: None
"e s", "(s)", "s, " Output: "s"
" m ", "qdqd m dqq ", "sssss m", "m lllll" Output: "m"
"l", "qddwfq l " Output: "l"
CodePudding user response:
You can use the following regex:
\b[smlSML]\b
\b
= word boundary[smlSML]
= find any of the following characters :s,m,l,S,M,L
\b
= word boundary
Code:
import re
examples = ["es", "setr", "qwfq qf", "lllll", "e s", "(s)", "s, ", " m ", "qdqd m dqq ", "sssss m", "m lllll", "l", "qddwfq l "]
p = re.compile(r'\b[smlSML]\b')
for ex in examples:
result = p.search(ex)
if result is not None:
result = result.group(0)
print(f"Input = {ex:<11}- Output = {result}")
Output:
Input = es - Output = None
Input = setr - Output = None
Input = qwfq qf - Output = None
Input = lllll - Output = None
Input = e s - Output = s
Input = (s) - Output = s
Input = s, - Output = s
Input = m - Output = m
Input = qdqd m dqq - Output = m
Input = sssss m - Output = m
Input = m lllll - Output = m
Input = l - Output = l
Input = qddwfq l - Output = l