E.g. The input string is Hi a14457h11 what is up?
I just want to extract a14457h11
from the sentence.
Input could also be Hi a14457h11 what is up? I am b14457h12 good. That is a14457h11 great
Here the extracted output would be: a14457h11
and b14457h12
(the third one is repeated).
It will always be in the same format: 1 alphabet followed by 5 digits followed by 1 alphabet and 2 more digits (in the same order).
I can do it in raw python, but is there a way to do it in regex as well?
CodePudding user response:
You can use this pattern: [a-z][0-9]{5}[a-z][0-9]{2}
import re
s = """"Hi a14457h11 what is up? I am b14457h12 good. That is b14457h12 great"""
regex = re.compile(r'[a-z][0-9]{5}[a-z][0-9]{2}')
l = regex.findall(s)
print(l)
# ['a14457h11', 'b14457h12', 'b14457h12']
If the letters in your input are case sensitive you will have to change the char classes([a-z]
) accordingly.
Explanation:
[a-z] matches one lower case letter
[0-9]{5} matches five numbers
[a-z] matches one lower case letter
[0-9]{2} matches two numbers
Test here: https://regex101.com/r/XHzITt/1
CodePudding user response:
The idea being, it will always be in the same format: 1 alphabet followed by 5 digits followed by 1 alphabet and 2 more digits (in the same order)
Since it always follows the same order. We could set up our regex with a rule based on the below.
r"(\w\d{5}\w\d{2})"
Explaination
- \w matches any word character group
- \d matches a digit (equivalent to [0-9])
- {5} matches the previous token exactly 5 times
- \w matches any word character group
- \d matches a digit (equivalent to [0-9])
- {2} matches the previous token exactly 2 times
ref: regex101
Edit1:
As pointed out by @anotherGatsby since the \w
matches [a-zA-Z0-9_]
, we could also use [abc]
which matches a single character in the list abc (case sensitive)
as an alternative.
CodePudding user response:
import re
s = "Hi a14457h11 what is up? I am b14457h12 good. That is b14457h12 great"
regex = re.compile(r'[a-z]\d{5}[a-z]\d\d')
f = set(regex.findall(s))
print(f)
# {'b14457h12', 'a14457h11'}