I have some sample text as below:
MTG-2022039036 MTG
MTG-LR 3136 / 130 MTG
MTG-LR 201260 / 9046 ASSIGN
MTG-2021063349 MTG
My desired Results:
2022039036
3136 / 130
201260 / 9046
2021063349
My regex patterns work individually just fine example:
match1 = re.search(r'(\d \/ ?\d )', ref)
num1 = match1.group(1) if match1 else None
# correctly returns 3136 / 130
match2 = re.search(r'(?:-?)(\d )', ref)
num2 = match2.group(1) if match2 else None
# correctly returns 2021063349
But I want to combine them in one line like below to match either one or other pattern since only one case will occur in each string:
match = re.search(r'(?:-?)(\d )|(\d \/ ?\d )', ref)
num = match.group(1) if match else None
# This only returns 3136
I feel like I'm doing a very simple thing but no idea why now this doesn't work. I have used '|' for matching either or conditions in pandas str.extract() and had no problems there. Please advise.
CodePudding user response:
With your shown samples please try following regex.
^MTG-[^0-9]*(\d (?:\s /\s \d )?)
Here is the Online demo for above regex.
With Python3 code, please try following, using findall
function of re
module and in that using re.M
flag true for multiline enabling.
import re
var="""MTG-2022039036 MTG
MTG-LR 3136 / 130 MTG
MTG-LR 201260 / 9046 ASSIGN
MTG-2021063349 MTG"""
re.findall(r'^MTG-[^0-9]*(\d (?:\s /\s \d )?)', var, re.M)
['2022039036', '3136 / 130', '201260 / 9046', '2021063349']
CodePudding user response:
There does not seem to be an optional space after the /
, but you might use a single pattern:
\b\d (?: / ?\d )?\b
Explanation
\b
A word boundary to prevent a partial word match\d
Match 1 digits(?: / ?\d )?
Optionally match/
then an optional space and 1 digits\b
A word boundary
import re
pattern = r"\b\d (?: / ?\d )?\b"
s = ("MTG-2022039036 MTG\n"
"MTG-LR 3136 / 130 MTG \n"
"MTG-LR 201260 / 9046 ASSIGN\n"
"MTG-2021063349 MTG")
print(re.findall(pattern, s))
Output
['2022039036', '3136 / 130', '201260 / 9046', '2021063349']
Or use a capture group matching the leading MTG- with optional LR, where the group 1 value will be returned by re.findall
\bMTG-(?:LR )?(\d (?: / \d )?)\b
Explanation
\bMTG-
Match literally with a leading word boundary(?:LR )?
Optionally matchLR
(
Capture group 1\d (?: / \d )?
Optionally match/
then an optional space and 1 digits
)
Close group 1\b
A word boundary