Home > Back-end >  How to match number followed by . and a space character?
How to match number followed by . and a space character?

Time:12-21

Basically I have some text like this:

  1. first line
  2. second
  3. more lines
  4. bullet points

I separate these line by line so I can process them, but I want to be able to see if a line actually starts with a number then a . and then a space character.

So I can use this so split the line into 2 and process each of these separately. The number part with the . and space will be treated differently than the rest.

What's the best way to do this in Python? I didn't want to do a simple number check as characters because the numbers can be anything but likely less than 100.

CodePudding user response:

The following should get you the two parts (number full stop) and (everything after space) into two capture groups.

import re


def get_number_full_stop(input_string: str):
    res = re.search("^(\d \.)\s(. )", input_string)
    if res:
        return res.groups()
    else:
        return None


print(get_number_full_stop("1. hello"))
print(get_number_full_stop("1.hello"))

CodePudding user response:

You can use a regular expression to check if a string starts with a number followed by a period and a space.

import re

text = "1.  first line"
regex_rule = r'^\d \.\s' 
if re.match(regex_rule, text):
    # The text starts with a number followed by a period and a space or tab
    number, rest = re.split(r'^\d \.\s', text, maxsplit=1)
    print(number)  # prints "1."
    print(rest)    # prints "first line"

The maxsplit parameter specifies the maximum number of splits to do. In this case, we set it to 1 to split the string into two parts: the number followed by a period and a space, and the rest of the string.

CodePudding user response:

This returns a list bullets that contains the lines with bullet points with a . and a <space>.

with open(file, "r") as f:
    lines = f.readlines()

    bullets = [line for line in lines if line[ : line.find(". ")].isdigit()]

CodePudding user response:

If you are looking for non-regex answers then you can try:

text = """1. first line
2. second
3. more lines
14. bullet points"""

for line in text.splitlines():
    # line.find(str) will return -1 if not found
    if (dot_index := line.find('. ')) != -1:
        # isnumeric checks if the string contains only digits
        if line[:dot_index].isnumeric():
            print(line)

Output:

1. first line
2. second
3. more lines
14. bullet points

Note: := aka walrus operator, was introduced in python 3.8


For regex:

>>> import re
>>> re.findall(r"""^    # Startswith anchor (^)
(\d )                   # Match one or more digits (capture group)
\.                      # Match a literal dot
\s                      # Match a single space
(.*)                    # Match everything else (capture group)
$                       # Till the end of line
""",
text,
re.MULTILINE | re.VERBOSE
)

# Output
[('1', ' first line'), ('2', ' second'), ('3', ' more lines'), ('14', ' bullet points')]

CodePudding user response:

First of all, read the file and split it into lines:

with open(route) as file:
    # We read the file
    lines=file.read()
    # We split the file into lines
    lines = lines.split("\n")
    

Then, we check if the conditions are met (first character is a number, second one is a '.' and third one is a space) for each line.

for line in lines:
    if line[0].isdigit() and line[1]=='.' and line[2]==' ':
        # do something
    else:
        # do something else
  • Related