Home > Blockchain >  Extracting numbers via regular expressions
Extracting numbers via regular expressions

Time:12-19

I want to extract numbers from a long string with regular expressions. But i just want numbers with exactly 8 digits. (and no letters in front or at the end of the string) But there could be possibly a leading k.

Please imagine the following example as part in a big string.

12345678       ->     i want this number
s12345678t     ->     i dont want this number
1234567        ->     i dont want this number
123456789      ->     i dont want this number
k12345678      ->     i want this number (without the k -> the extracted number is 12345678)
sk12345678     ->     i dont want this number

and i want to solve my problem with regular expressions

But i dont have a idea how to do it.

Would be very nice if you could help me. Thanks!

I tried a lot but it didnt work and i dont get how this with the regular expressions works in my example

CodePudding user response:

As I mean, regex captures the 8 digits in a separate group and allows you to access them separately from the "k".

You can analyze this example:

import re

string = "The numbers are 12345678, s12345678t, 1234567, 123456789, k12345678, and sk12345678."

numbers = re.findall(r'\bk?(\d{8})\b', string)

print(numbers)

This will output the following list of numbers:

['12345678', '12345678']

You can also use this regular expression with other programming languages or tools that support regular expressions.

CodePudding user response:

import re

string = "12345678 s12345678t 1234567 123456789 k12345678 sk12345678"

# Extract the numbers using the regular expression
numbers = re.findall(r"(?:^|\D)(?<!k)(\d{8})(?!\d)", string)

print(numbers)  

Output:

['12345678', '12345678']

Explanation:

(?:^|\D): This matches the start of the string (^) or any non-digit character (\D). This is used to ensure that the number is not preceded by any letters or digits.

(?<!k): This is a negative lookbehind assertion that ensures that the number is not preceded by the letter "k".

(\d{8}): This matches exactly 8 digits. The parentheses capture the matched digits so that they can be extracted.

(?!\d): This is a negative lookahead assertion that ensures that the number is not followed by any digits.

CodePudding user response:

Try this pattern:

\b\d{8}\b|\bk\d{8}\b

Regex demo.

  • \b\d{8}\b

    • match 8 digits.
    • \b - word boundary

OR:

  • \bk\d{8}\b

    • match k and 8 digits
    • \b - word boundary

import re

text = """\
12345678       ->     i want this number
s12345678t     ->     i dont want this number
1234567        ->     i dont want this number
123456789      ->     i dont want this number
k12345678      ->     i want this number (without the k -> the extracted number is 12345678)
sk12345678     ->     i dont want this number"""

pat = re.compile(r"\b\d{8}\b|\bk\d{8}\b")

for n in pat.findall(text):
    print(n)

Prints:

12345678
12345678
12345678
  • Related