How to match number followed by . and a space character?-CodePudding

Basically I have some text like this:

first line

second

more lines

bullet points

I separate these line by line so I can process them, but I want to be able to see if a line actually starts with a number then a . and then a space character.

So I can use this so split the line into 2 and process each of these separately. The number part with the . and space will be treated differently than the rest.

What's the best way to do this in Python? I didn't want to do a simple number check as characters because the numbers can be anything but likely less than 100.

CodePudding user response：

The following should get you the two parts (number full stop) and (everything after space) into two capture groups.

import re


def get_number_full_stop(input_string: str):
    res = re.search("^(\d \.)\s(. )", input_string)
    if res:
        return res.groups()
    else:
        return None


print(get_number_full_stop("1. hello"))
print(get_number_full_stop("1.hello"))

CodePudding user response：

You can use a regular expression to check if a string starts with a number followed by a period and a space.

import re

text = "1.  first line"
regex_rule = r'^\d \.\s' 
if re.match(regex_rule, text):
    # The text starts with a number followed by a period and a space or tab
    number, rest = re.split(r'^\d \.\s', text, maxsplit=1)
    print(number)  # prints "1."
    print(rest)    # prints "first line"

The maxsplit parameter specifies the maximum number of splits to do. In this case, we set it to 1 to split the string into two parts: the number followed by a period and a space, and the rest of the string.

CodePudding user response：

This returns a list bullets that contains the lines with bullet points with a . and a <space>.

with open(file, "r") as f:
    lines = f.readlines()

    bullets = [line for line in lines if line[ : line.find(". ")].isdigit()]

CodePudding user response：

If you are looking for non-regex answers then you can try:

text = """1. first line
2. second
3. more lines
14. bullet points"""

for line in text.splitlines():
    # line.find(str) will return -1 if not found
    if (dot_index := line.find('. ')) != -1:
        # isnumeric checks if the string contains only digits
        if line[:dot_index].isnumeric():
            print(line)

Output:

1. first line
2. second
3. more lines
14. bullet points

Note: := aka walrus operator, was introduced in python 3.8

For regex:

>>> import re
>>> re.findall(r"""^    # Startswith anchor (^)
(\d )                   # Match one or more digits (capture group)
\.                      # Match a literal dot
\s                      # Match a single space
(.*)                    # Match everything else (capture group)
$                       # Till the end of line
""",
text,
re.MULTILINE | re.VERBOSE
)

# Output
[('1', ' first line'), ('2', ' second'), ('3', ' more lines'), ('14', ' bullet points')]

CodePudding user response：

First of all, read the file and split it into lines:

with open(route) as file:
    # We read the file
    lines=file.read()
    # We split the file into lines
    lines = lines.split("\n")

Then, we check if the conditions are met (first character is a number, second one is a '.' and third one is a space) for each line.

for line in lines:
    if line[0].isdigit() and line[1]=='.' and line[2]==' ':
        # do something
    else:
        # do something else