Home > Enterprise >  Read .gz files from a list and print lines
Read .gz files from a list and print lines

Time:04-22

Basically, I have a folder where absolutely huge log files are archived every day. 3 log files are created per day more precisely.

I'm working on a Python script where the user has to enter a date in YYYYMMDD format in order to locate the 3 files that have been created on this date, then he enters an IP address. And the script will read the content of the 3 .gz files and print the lines where the IP address is present.

import re
import os
import glob
import gzip
from datetime import datetime, timedelta

date_entry = raw_input('Give a date in format YEAR, MONTH, DAY \n')
date = datetime.strptime(re.sub("\s ", "", date_entry), "%Y,%m,%d").date()

path = "/applis/tacacs/log/"

list_of_files = [
    file for file in glob.glob(path   '*.gz')
    if date == datetime.fromtimestamp(os.path.getmtime(file)).date()
]

print("Files found: ")
print(list_of_files)
Adresse_IP = raw_input('IP Address \n')

for line in gzip.open(list_of_files):
                if re.search(Adresse_IP, line):
                        print line

But I get the following error:

  File "scriptacacs3.py", line 19, in <module>
    for line in gzip.open(list_of_files):
  File "/usr/lib/python2.7/gzip.py", line 34, in open
    return GzipFile(filename, mode, compresslevel)
  File "/usr/lib/python2.7/gzip.py", line 94, in __init__
    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: coercing to Unicode: need string or buffer, list found

Apparently it is waiting for a string. Is there a way to do this differently? Is it possible to read one file at a time?

EDIT: Can the fileinput method be used in this kind of situation? For example using fileinput.hook_compressed since the files are in .gz format and in a list.

I also mention that the files found after entering the date look like this:

['/applis/tacacs/log/tacacs.log.7.gz', '/applis/tacacs/log/tacacs_acct.log.7.gz', '/applis/tacacs/log/tacacs_provisioning.log.4.gz']

I've been looking for a way to do this all morning and I'm still stuck. If someone could give me a clue, I'd appreciate it.

CodePudding user response:

You are passing the complete list of filtered log files name at once, that's why you are getting error, iterate the list pass or read file one by one and then search the file

import re
import os
import glob
import gzip
from datetime import datetime, timedelta

date_entry = raw_input('Give a date in format YEAR, MONTH, DAY \n')
date = datetime.strptime(re.sub("\s ", "", date_entry), "%Y,%m,%d").date()

path = "/applis/tacacs/log/"

list_of_files = [
    file for file in glob.glob(path   '*.gz')
    if date == datetime.fromtimestamp(os.path.getmtime(file)).date()
]

print("Files found: ")
print(list_of_files)
Adresse_IP = raw_input('IP Address \n')

for fname in list_of_files: #iterate log file names to open it one by one
    with gzip.open(fname, 'r') as file: #open single file
        for line in file: #iterate all lines
            if re.search(Adresse_IP, line): #search line
                print(line) #print line if match
  • Related