Here is what I had on bash & need to convert on python (mandatory no choice in my company) but it's cool to learn new things :
content of input test file is made of lines looking like
2022-08-11 13:53:15 ; INFO ; file=toto ; Upload size = 13 KB ; result = ....
2022-08-11 13:54:55 ; other info ; rate = 5.3 ;
2022-08-11 13:57:02 | not to be kept line
2022-08-11 13:59:15 ; INFO ; file=titi ; Upload size = 3 KB ; result =...
and so on but the real file will contain other log lines formats (for security reasons I cannot copy here a real line) so I use a test file
here is the exact command that give the attended output
grep -ihE "size|rate|type_[DI][TA][FT]|source|dest" ../data/*.{log,debug} | sort -t " " -k1,6 -k2 > filtre.txt
So first I want to try it without creating the outputfile
here is what I am trying to with python (I'm limited to 2.7 & cannot choose anything else do not ask or mention about it)
import os
import re
import string
import sys
datalogpath = sys.argv[1] # get the path of log files to extract datas
searchpattern = re.compile("size|rate|type_D|type_I|source|dest") # regexp to filter from logs directory
# step 1- equ grep all from
for filename in os.listdir(datalogpath):
with open(os.path.join(datalogpath, filename)) as in_file:
for line in in_file:
found = searchpattern.search(line)
if found :
print(found.group(0))
What's currently seem to appear from test file is only
size
size
size
size
size
instead of the each full lines containing size or any of the other words I'm looking for the grep command replies with the all 23 lines (all content of each)
like
2022-08-11 13:53:15 ; INFO ; file=toto ; Upload size = 13 KB ; result = ....
2022-08-11 13:54:55 ; other info ; rate = 5.3 ;
2022-08-11 13:59:15 ; INFO ; file=titi ; Upload size = 3 KB ; result =...
so for example line
2022-08-11 13:57:02 | not to be kept line
is not displayed on output
none of the official documentation chapters was fitting this use-case,
Please help me redefine the correct regexp in python format and/or file reading method if this one is bad
CodePudding user response:
change the
print(found.group(0))
to
print(line)
you want to display the full line when there is a match, not just what the re matched against.