I want to extract numbers from a long string with regular expressions. But i just want numbers with exactly 8 digits. (and no letters in front or at the end of the string) But there could be possibly a leading k.
Please imagine the following example as part in a big string.
12345678 -> i want this number
s12345678t -> i dont want this number
1234567 -> i dont want this number
123456789 -> i dont want this number
k12345678 -> i want this number (without the k -> the extracted number is 12345678)
sk12345678 -> i dont want this number
and i want to solve my problem with regular expressions
But i dont have a idea how to do it.
Would be very nice if you could help me. Thanks!
I tried a lot but it didnt work and i dont get how this with the regular expressions works in my example
CodePudding user response:
As I mean, regex captures the 8 digits in a separate group and allows you to access them separately from the "k".
You can analyze this example:
import re
string = "The numbers are 12345678, s12345678t, 1234567, 123456789, k12345678, and sk12345678."
numbers = re.findall(r'\bk?(\d{8})\b', string)
print(numbers)
This will output the following list of numbers:
['12345678', '12345678']
You can also use this regular expression with other programming languages or tools that support regular expressions.
CodePudding user response:
import re
string = "12345678 s12345678t 1234567 123456789 k12345678 sk12345678"
# Extract the numbers using the regular expression
numbers = re.findall(r"(?:^|\D)(?<!k)(\d{8})(?!\d)", string)
print(numbers)
Output:
['12345678', '12345678']
Explanation:
(?:^|\D)
: This matches the start of the string (^)
or any non-digit character (\D)
. This is used to ensure that the number is not preceded by any letters or digits.
(?<!k)
: This is a negative lookbehind assertion that ensures that the number is not preceded by the letter "k"
.
(\d{8})
: This matches exactly 8 digits. The parentheses capture the matched digits so that they can be extracted.
(?!\d)
: This is a negative lookahead assertion that ensures that the number is not followed by any digits.
CodePudding user response:
Try this pattern:
\b\d{8}\b|\bk\d{8}\b
\b\d{8}\b
- match 8 digits.
\b
- word boundary
OR:
\bk\d{8}\b
- match
k
and 8 digits \b
- word boundary
- match
import re
text = """\
12345678 -> i want this number
s12345678t -> i dont want this number
1234567 -> i dont want this number
123456789 -> i dont want this number
k12345678 -> i want this number (without the k -> the extracted number is 12345678)
sk12345678 -> i dont want this number"""
pat = re.compile(r"\b\d{8}\b|\bk\d{8}\b")
for n in pat.findall(text):
print(n)
Prints:
12345678
12345678
12345678