I have a folder which contain 5 folders, with round 450-550 text files each. The text file has around 1-12 sentences varying in length, seperated by a tab, like this:
i love burgers
i want to eat a burger
etc
I want to create a code which asks the user to input a search term and then goes inside each folder, opens and reads each text file, and matches how many times that search term appears. Then, go back out to the next folder, rinse and repeat till it goes through every folder and every text file.
So the output should be something like this:
input search term: good
the search term appears this many times __ in the following files
file name 001.txt
file name 002.txt
file name 003.txt
Here is some of the code I have so far:
from pathlib import Path
import os
from os.path import isdir, isfile
import nltk
search_word = input("Please enter the word you want to search for: ")
punctuation = "he fold!,:;-_'.?"
location = Path(r'the folder')
os.chdir(location)
print(Path.cwd())
fileslist = os.listdir(Path.cwd())
print(fileslist)
for file in fileslist:
if isdir(file):
os.chdir(file)
print(Path.cwd())
content = os.listdir(Path.cwd())
for document in content:
with open(document,'r') as infile:
data = []
for line in infile:
data = [line.strip(punctuation)]
print(data)
os.chdir('../')
print(Path.cwd())
else:
os.chdir(location)
I have tried watching some YouTube videos on how to do it, but I haven't been able to figure it out.
CodePudding user response:
If you just want to count the number of occurrences of a word, for example, in a set of .txt
files, something like this will do it:
from pathlib import Path
word = input('Enter the word you want to search for: ')
path = Path('/some/folder')
counter = {}
for file in path.rglob('*.txt'):
if file.is_file():
counter[file] = file.read_text().count(word)
print(
f'The search term "{word}" appears {sum(counter.values())}',
'times in the following files:'
)
for file in [_ for _ in counter if counter[_]]:
print(f'{file}: {counter[file]} times')
CodePudding user response:
this would be a perfect use case for the walk()
function in the os
module.
given a start directory os.walk()
recursively iterates through the directory structure and provides a tuple of (current_directory, directory_names, file_names)
then you can iterate through the filenames to check which ones end with '.txt'
and open that file and use a generator expression to check each line of the file to see if the line contains the search term and sum up the results of the generator with the sum()
function
import os
import os.path
STARTDIR=input("directory: ")
SEARCH=input("search term: ")
total = 0
for dirname, dirlist, filelist in os.walk(STARTDIR):
for filename in filelist:
if filename.endswith(".txt"):
# get full filename to use with open() function
fullname = os.path.join(dirname, filename) name
# use generator expression to iterate over the lines of the
# opened file and sum up the results (True == 1 for sum())
count = sum(SEARCH in line for line in open(fullname))
# if non zero count then print the filename and count
if count:
print(f"{fullname} contains {count} lines with {SEARCH}")
total = count
print(f"{SEARCH} occurred a total of {total} times")
SAMPLE OUTPUT:
directory: c:\downloads\test
search term: hello
c:\downloads\test\a\aa\info.txt contains 1 lines with hello
c:\downloads\test\a\aa\log.txt contains 1 lines with hello
c:\downloads\test\a\bb\greeting.txt contains 1 lines with hello
c:\downloads\test\b\cc\control.txt contains 3 lines with hello
c:\downloads\test\b\cc\dumb.txt contains 1 lines with hello
c:\downloads\test\b\cc\info.txt contains 4 lines with hello
c:\downloads\test\c\aa\dog.txt contains 2 lines with hello
c:\downloads\test\c\dd\good.txt contains 1 lines with hello
hello occurred a total of 14 times