Basically I have some text like this:
- first line
- second
- more lines
- bullet points
I separate these line by line so I can process them, but I want to be able to see if a line actually starts with a number then a . and then a space character.
So I can use this so split the line into 2 and process each of these separately. The number part with the . and space will be treated differently than the rest.
What's the best way to do this in Python? I didn't want to do a simple number check as characters because the numbers can be anything but likely less than 100.
CodePudding user response:
The following should get you the two parts (number full stop) and (everything after space) into two capture groups.
import re
def get_number_full_stop(input_string: str):
res = re.search("^(\d \.)\s(. )", input_string)
if res:
return res.groups()
else:
return None
print(get_number_full_stop("1. hello"))
print(get_number_full_stop("1.hello"))
CodePudding user response:
You can use a regular expression to check if a string starts with a number followed by a period and a space.
import re
text = "1. first line"
regex_rule = r'^\d \.\s'
if re.match(regex_rule, text):
# The text starts with a number followed by a period and a space or tab
number, rest = re.split(r'^\d \.\s', text, maxsplit=1)
print(number) # prints "1."
print(rest) # prints "first line"
The maxsplit parameter specifies the maximum number of splits to do. In this case, we set it to 1 to split the string into two parts: the number followed by a period and a space, and the rest of the string.
CodePudding user response:
This returns a list bullets
that contains the lines with bullet points with a .
and a <space>
.
with open(file, "r") as f:
lines = f.readlines()
bullets = [line for line in lines if line[ : line.find(". ")].isdigit()]
CodePudding user response:
If you are looking for non-regex answers then you can try:
text = """1. first line
2. second
3. more lines
14. bullet points"""
for line in text.splitlines():
# line.find(str) will return -1 if not found
if (dot_index := line.find('. ')) != -1:
# isnumeric checks if the string contains only digits
if line[:dot_index].isnumeric():
print(line)
Output:
1. first line
2. second
3. more lines
14. bullet points
Note: :=
aka walrus operator, was introduced in python 3.8
For regex:
>>> import re
>>> re.findall(r"""^ # Startswith anchor (^)
(\d ) # Match one or more digits (capture group)
\. # Match a literal dot
\s # Match a single space
(.*) # Match everything else (capture group)
$ # Till the end of line
""",
text,
re.MULTILINE | re.VERBOSE
)
# Output
[('1', ' first line'), ('2', ' second'), ('3', ' more lines'), ('14', ' bullet points')]
CodePudding user response:
First of all, read the file and split it into lines:
with open(route) as file:
# We read the file
lines=file.read()
# We split the file into lines
lines = lines.split("\n")
Then, we check if the conditions are met (first character is a number, second one is a '.' and third one is a space) for each line.
for line in lines:
if line[0].isdigit() and line[1]=='.' and line[2]==' ':
# do something
else:
# do something else