Home > Back-end >  python script : convert a bash grep and sort line to insert it in my python one
python script : convert a bash grep and sort line to insert it in my python one

Time:09-09

Here is what I had on bash & need to convert on python (mandatory no choice in my company) but it's cool to learn new things :

content of input test file is made of lines looking like

2022-08-11 13:53:15 ; INFO ; file=toto ; Upload size = 13 KB ; result = ....
2022-08-11 13:54:55 ; other info ; rate = 5.3  ; 
2022-08-11 13:57:02 | not to be kept line 
2022-08-11 13:59:15 ; INFO ; file=titi ; Upload size = 3 KB ; result =...

and so on but the real file will contain other log lines formats (for security reasons I cannot copy here a real line) so I use a test file

here is the exact command that give the attended output

grep -ihE "size|rate|type_[DI][TA][FT]|source|dest" ../data/*.{log,debug} | sort -t " " -k1,6 -k2 > filtre.txt

So first I want to try it without creating the outputfile

here is what I am trying to with python (I'm limited to 2.7 & cannot choose anything else do not ask or mention about it)

import os
import re
import string
import sys 

datalogpath = sys.argv[1]       #  get the path of log files to extract datas

searchpattern = re.compile("size|rate|type_D|type_I|source|dest")  # regexp to filter from logs directory

# step 1- equ grep all from 


for filename in os.listdir(datalogpath):
    with open(os.path.join(datalogpath, filename)) as in_file:
        for line in in_file:
            found = searchpattern.search(line)
            if found :
                print(found.group(0))

What's currently seem to appear from test file is only

size
size
size
size
size

instead of the each full lines containing size or any of the other words I'm looking for the grep command replies with the all 23 lines (all content of each)

like

2022-08-11 13:53:15 ; INFO ; file=toto ; Upload size = 13 KB ; result = ....
2022-08-11 13:54:55 ; other info ; rate = 5.3  ; 
2022-08-11 13:59:15 ; INFO ; file=titi ; Upload size = 3 KB ; result =...

so for example line

2022-08-11 13:57:02 | not to be kept line 

is not displayed on output

none of the official documentation chapters was fitting this use-case,

Please help me redefine the correct regexp in python format and/or file reading method if this one is bad

CodePudding user response:

change the

                print(found.group(0))

to

                print(line)

you want to display the full line when there is a match, not just what the re matched against.

  • Related