I have a file file.txt
like this:
this is first line new line1
this is second line new line2
new( line added ) new line3
this is third line new line4
this is fourth line_1 new line5
from this i need to check if certain strings are present in file and if present need to do some action.
This is current code:
import re
with open("file.txt", "r") as f:
first = False
second = False
third = False
fourth = False
for line in f:
for str in line.split():
pattern = re.compile('this|is|first|line')
if pattern.match(str):
first = True
continue
pattern = re.compile('this|is|second|line')
if pattern.match(str):
second = True
continue
if(first == True):
print("something1")
else:
print("not found first")
exit()
if(second == True):
print("something2")
else:
print("not found second")
exit()
but the current output is:
something1
not found second
but the expected code
this:
something1
something2
the second line is present in file still it is printing else condition.(Note: Checking of string should be done in this order. i.e if first condition is not present then it should exit and not do further checks)
CodePudding user response:
The word or phrase patterns are regular expressions — just very simple ones. In a regular expression, most characters, including letters and numbers, represent themselves. For example, the regex pattern 1 matches the string ‘1’, and the pattern boy matches the string ‘boy’.
There are a number of reserved characters called metacharacters that do not represent themselves in a regular expression, but they have a special meaning that is used to build complex patterns. These metacharacters are as follows: ., *, [, ], ˆ, $, and .
CodePudding user response:
There were multiple issues with your program:
first
andsecond
were not being reset back to false (I moved it into the body of the outer loop so they are now).- You call
exit()
when the second line fails to match the first pattern, which aborts the entire program. - The previously mentioned issue that the first regex ALSO matches the second line's
"this"
. It probably shouldn't if they're mutually exclusive.
Other issues:
- You shouldn't need to
split()
the line and iterate over it. - Calling
re.compile
inside the loop is performance heavy. Call it outside the loop for faster programs. - You can just use
if (pattern.match(line))
instead of assigning True/False to other variables.
The bugs fixed:
import re
with open("file.txt", "r") as f:
for line in f:
first = False
second = False
third = False
fourth = False
for str in line.split():
pattern = re.compile('first')
if pattern.match(str):
first = True
continue
pattern = re.compile('second')
if pattern.match(str):
second = True
continue
if(first == True):
print("something1")
elif(second == True):
print("something2")
All issues fixed, code DRY-ed up, etc.:
import re
with open("file.txt", "r") as f:
firstPattern = re.compile('this is first line')
secondPattern = re.compile('this is second line')
for line in f:
if (firstPattern.match(line)):
print("something1")
elif (secondPattern.match(line)):
print("something2")