Home > Software engineering >  Python - Trying to extract lines containing a key word from multiple files in a directory
Python - Trying to extract lines containing a key word from multiple files in a directory

Time:05-15

I am trying to build a script which can look for all files in a certain folder, and pull any lines of text that contain a key word or phrase.

Very new to python, and not really understanding how to piece together multiple suggestions from others I have seen.

import re
from glob import glob

search = []
linenum = 0
pattern = re.compile("Dawg", re.IGNORECASE)  # Compile a case-insensitive regex
path = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in\\Audit_2022.log'
filenames = glob('*.log')
print(f"\n{filenames}")
with open (path, 'rt') as myfile:    
    for line in myfile:
        linenum  = 1
        if pattern.search(line) != None:      # If a match is found 
            search.append((linenum, line.rstrip('\n')))
for x in search:                            # Iterate over the list of tuples
    print("\nLine "   str(x[0])   ": "   x[1])

This does everything exactly how I want it, except can only see one file at a time. My issue arises when I try deleting 'Audit_2022.log' from the end of the path = line.

Python says "PermissionError: [Errno 13] Permission denied: 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'". I assume this is because it's looking at a directory and not a file, but how can I get it to read multiple files?

Many thanks in advance!

CodePudding user response:

Assuming you also need to show the filename(s) you could do this:

import re
from glob import glob
import os
p = re.compile('Dawg', re.IGNORECASE)
path = r'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
for file in glob(os.path.join(path, '*.log')):
    with open(file) as logfile:
        for i, line in enumerate(map(str.strip, logfile), 1):
            if p.search(line) is not None:
                print(f'File={file}, Line={i}, Data={line}')

CodePudding user response:

The reason you're getting that Exception is because open needs a filename, and if you give it just a path, it doesn't really know what to do. A minimal example could be:

path = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in\\Audit_2022.log'
with open (path, 'rt') as f:
  pass

If the file exists, this should run fine, but if you change it to:

path = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in'
with open (path, 'rt') as f:
  pass

Then this will throw the exception.

I suspect what you're trying to do is glob through all log files in path and try each one, so something like:

import os
path = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in'
filenames = glob(os.path.join(path, '*.log'))   
print(f"\n{filenames}")
for filename in filenames:
  with open (filename, 'rt') as myfile:
  ...

CodePudding user response:

You can use os.listdir() to get all files in a directory, then nest your opening loop for each file in the directory:

import os

folder = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in'

for file in glob(os.path.join(folder, '*.log')):
    with open(file, 'rt') as myfile:
        for line in myfile:
            linenum  = 1
            if pattern.match(line): # If a match is found
                search.append((linenum, line.rstrip('\n')))

See os.path.join() for a better path joining alternative

  • Related