I am searching for sentences containing characters using Python regular expressions. But I can't find the sentence I want. Please help me
regex.py
opfile = open(file.txt, 'r')
contents = opfile.read()
opfile.close()
index = re.findall(r'\[start file\](?:.|\n)*\[end file\]', contents)
item = re.search(r'age.*', str(index))
file.txt(example)
[start file]
name: steve
age: 23
[end file]
result
<re.Match object; span=(94, 738), match='age: >
The age is not printed
CodePudding user response:
There are several issues here:
- The
str(index)
returns the string literal representation of the string list, and it makes it difficult to further process the result (?:.|\n)*
is a very resource consuming construct, use a mere.
with there.S
orre.DOTALL
option- If you plan to find a single match, use
re.search
, notre.findall
.
Here is a possible solution:
match = re.search(r'\[start file].*\[end file]', contents, re.S)
if match:
match2 = re.search(r"\bage:\s*(\d )", match.group())
if match2:
print(match2.group(1))
Output:
23
If you want to get age
in the output, use match2.group()
.
CodePudding user response:
If you want to match the age only once between the start and end file markers, you could use a single pattern with a capture group and in between match all lines that do not start with age: or the start or end marker.
^\[start file](?:\n(?!age:|\[(?:start|end) file]).*)*\nage: (\d )(?:\n(?!\[(?:start|end) file]).*)*\n\[end file]
Example
import re
regex = r"^\[start file](?:\n(?!age:|\[(?:start|end) file]).*)*\nage: (\d )(?:\n(?!\[(?:start|end) file]).*)*\n\[end file]"
s = ("[start file]\n" "name: steve \n" "age: 23\n" "[end file]")
m = re.search(regex, s)
if m:
print(m.group(1))
Output
23
CodePudding user response:
The example input looks like a list of key, value pairs enclosed between some start/end markers. For this use-case, it might be more efficient and readable to write the parsing stage as:
re.search
to locate the documentsplitlines()
to isolate individual recordssplit()
to extract the key and value of each record
Then, in a second step, access the extracted records.
Doing this allows to separate the parsing and exploitation parts and makes the code easier to maintain.
Additionally, a good practice is to wrap access to a file in a "context manager" (the with
statement) to guarantee all resources are correctly cleaned on error.
Here is a full standalone example:
import re
# 1: Load the raw data from disk, in a context manager
with open('/tmp/file.txt') as f:
contents = f.read()
# 2: Parse the raw data
fields = {}
if match := re.search(r'\[start file\]\n(.*)\[end file\]', contents, re.S):
for line in match.group(1).splitlines():
k, v = line.split(':', 1)
fields[k.strip()] = v.strip()
# 3: Actual data exploitation
print(fields['age'])