I have to create a search engine, that will search for a specific word inside a directory (folder) that contains text files.
For example, assume that we are searching for the word "machine" in a certain directory called X. What I want to achieve is to scan all the txt files inside X and its subdirectories as well.
I am getting maximum recursion depth exceeded while calling a Python object.
import os
from pathlib import Path
def getPath (folder) :
fpath = Path(folder).absolute()
return fpath
def isSubdirectory (folder) :
if folder.endswith(".txt") == False :
return True
else :
return False
def searchEngine (folder, word) :
path = getPath(folder)
occurences = {}
list = os.listdir (path) #get a list of the folders/files in this path
#assuming we only have .txt files and subdirectories in our folder :
for k in list :
if isSubdirectory(k) == False :
#break case
with open (k) as file :
lines = file.readlines()
for a in lines :
if a == word :
if str(file) not in occurences :
occurences[str(file)] = 1
else :
occurences[str(file)] = 1
return occurences
else :
return searchEngine (k, word)
CodePudding user response:
A couple of points:
- I couldn't reconstruct the recursion error when running your code. But I think you have a problem here:
list = os.listdir(path)
- this gives you only relative file/pathnames, but the following requires absolute ones (for example theopen
) once you're outside yourcwd
? - I think the
return
statement is misplaced: it returns after the first txt-file? - Python offers readymade solutions for walking through paths recursively:
os.walk()
,glob.glob()
andPath.rglob()
: Why don't you use them? Path.absolute()
isn't documented, I wouldn't use it. You could usePath.resolve()
instead?- You do nothing with the returned
occurences
in the recursion step: I think you should update the main dictionary after retrieving it? - Don't use
list
as a variable name - you're overriding access to the built-inlist()
.
Here's a suggestion with Path.rglob()
:
from pathlib import Path
def searchEngine(folder, word):
occurences = {}
for file in Path(folder).rglob('*.txt'):
key = str(file)
with file.open('rt') as stream:
for line in stream:
count = line.count(word)
if count:
if key not in occurences:
occurences[key] = count
else:
occurences[key] = count
return occurences
If you want to implement the recursion for yourself, then you could do something like:
def searchEngine(folder, word) :
base = Path(folder)
occurences = {}
if base.is_dir():
for path in base.iterdir():
occurences.update(searchEngine(path, word))
elif base.suffix == '.txt':
with base.open('rt') as stream:
key = str(base)
for line in stream:
count = line.count(word)
if count:
if key not in occurences:
occurences[key] = count
else:
occurences[key] = count
return occurences