Basically, I have a folder where absolutely huge log files are archived every day. 3 log files are created per day more precisely.
I'm working on a Python script where the user has to enter a date in YYYYMMDD format in order to locate the 3 files that have been created on this date, then he enters an IP address. And the script will read the content of the 3 .gz files and print the lines where the IP address is present.
import re
import os
import glob
import gzip
from datetime import datetime, timedelta
date_entry = raw_input('Give a date in format YEAR, MONTH, DAY \n')
date = datetime.strptime(re.sub("\s ", "", date_entry), "%Y,%m,%d").date()
path = "/applis/tacacs/log/"
list_of_files = [
file for file in glob.glob(path '*.gz')
if date == datetime.fromtimestamp(os.path.getmtime(file)).date()
]
print("Files found: ")
print(list_of_files)
Adresse_IP = raw_input('IP Address \n')
for line in gzip.open(list_of_files):
if re.search(Adresse_IP, line):
print line
But I get the following error:
File "scriptacacs3.py", line 19, in <module>
for line in gzip.open(list_of_files):
File "/usr/lib/python2.7/gzip.py", line 34, in open
return GzipFile(filename, mode, compresslevel)
File "/usr/lib/python2.7/gzip.py", line 94, in __init__
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: coercing to Unicode: need string or buffer, list found
Apparently it is waiting for a string. Is there a way to do this differently? Is it possible to read one file at a time?
EDIT: Can the fileinput method be used in this kind of situation? For example using fileinput.hook_compressed since the files are in .gz format and in a list.
I also mention that the files found after entering the date look like this:
['/applis/tacacs/log/tacacs.log.7.gz', '/applis/tacacs/log/tacacs_acct.log.7.gz', '/applis/tacacs/log/tacacs_provisioning.log.4.gz']
I've been looking for a way to do this all morning and I'm still stuck. If someone could give me a clue, I'd appreciate it.
CodePudding user response:
You are passing the complete list of filtered log files name at once, that's why you are getting error, iterate the list pass or read file one by one and then search the file
import re
import os
import glob
import gzip
from datetime import datetime, timedelta
date_entry = raw_input('Give a date in format YEAR, MONTH, DAY \n')
date = datetime.strptime(re.sub("\s ", "", date_entry), "%Y,%m,%d").date()
path = "/applis/tacacs/log/"
list_of_files = [
file for file in glob.glob(path '*.gz')
if date == datetime.fromtimestamp(os.path.getmtime(file)).date()
]
print("Files found: ")
print(list_of_files)
Adresse_IP = raw_input('IP Address \n')
for fname in list_of_files: #iterate log file names to open it one by one
with gzip.open(fname, 'r') as file: #open single file
for line in file: #iterate all lines
if re.search(Adresse_IP, line): #search line
print(line) #print line if match