I am reading regex pattern from a .txt file and passing it as a variable and using it to search in a very big text file. However, variable passing in regex search didnt work. My code snippet is
with open(r"C:\Desktop\list_pattern.txt", "r") as file1:
for pattern in file1:
with open(r'C:\Desktop\log.txt',"r") as my_file:
for lines in my_file:
k=re.search('{}'.format(pattern), lines) # I even tried re.search(pattern, lines)
if k!=None:
print("k is",k)
For example, the first lne in list_pattern.txt is "Battery Low" and it gives 0 match in log.txt. However, if i replace the code line with k=re.search('Battery Low', lines)
, it gives 12 match. Any idea what may be wrong? I am using python 3.10.
CodePudding user response:
When you read the file lines with for lines in my_file:
the line break chars remain at the end of the lines
. You need to use pattern.rstrip()
to get rid of the trailing whitespace, or - if the patterns can end in menaingful whitespace, it is safer to use .rstrip('\n')
. If you have no meaningful whitespace on both ends of each pattern, you can use pattern.strip()
.
There seems to be no reason to use str.format
, just use the pattern variable directly.
So you need to use
k=re.search(pattern.rstrip('\n'), lines)
# or if there can be no meaningful whitespace at the end of each pattern:
k=re.search(pattern.rstrip(), lines)
# or if there can be no meaningful whitespace on both ends of each pattern:
k=re.search(pattern.strip(), lines)
CodePudding user response:
it worked fine, I simulated with other files. the result was like this:
k is <re.Match object; span=(91, 93), match='in'>
k is <re.Match object; span=(3, 5), match='in'>
k is <re.Match object; span=(22, 24), match='in'>
k is <re.Match object; span=(4, 6), match='in'>
k is <re.Match object; span=(20, 22), match='in'>
k is <re.Match object; span=(40, 42), match='in'>
k is <re.Match object; span=(25, 27), match='in'>
k is <re.Match object; span=(30, 32), match='in'>
k is <re.Match object; span=(32, 34), match='in'>
k is <re.Match object; span=(50, 52), match='in'>
k is <re.Match object; span=(10, 12), match='in'>
k is <re.Match object; span=(165, 167), match='in'>
k is <re.Match object; span=(34, 36), match='in'>
k is <re.Match object; span=(26, 28), match='in'>
k is <re.Match object; span=(35, 37), match='in'>
k is <re.Match object; span=(14, 16), match='in'>
k is <re.Match object; span=(46, 48), match='in'>
k is <re.Match object; span=(20, 22), match='in'>
can you share the text files you are using