maximum recursion depth with files management-CodePudding

I have to create a search engine, that will search for a specific word inside a directory (folder) that contains text files.

For example, assume that we are searching for the word "machine" in a certain directory called X. What I want to achieve is to scan all the txt files inside X and its subdirectories as well.

I am getting maximum recursion depth exceeded while calling a Python object.

import os
from pathlib import Path

def getPath (folder) :

    fpath = Path(folder).absolute()
    return fpath

def isSubdirectory (folder) :

    if folder.endswith(".txt") == False :
        return True
    else :
        return False
 
def searchEngine (folder, word) :
    
    path = getPath(folder)
    occurences = {}
    list = os.listdir (path)     #get a list of the folders/files in this path

    #assuming we only have .txt files and subdirectories in our folder :

    for k in list :

        if isSubdirectory(k) == False :
            #break case
            with open (k) as file :                  
                lines = file.readlines()

                for a in lines :

                    if a == word :
                        if str(file) not in occurences :
                            occurences[str(file)] = 1
                        else :
                            occurences[str(file)]  = 1
            return occurences
                
        else :

            return searchEngine (k, word)

CodePudding user response：

A couple of points:

I couldn't reconstruct the recursion error when running your code. But I think you have a problem here: list = os.listdir(path) - this gives you only relative file/pathnames, but the following requires absolute ones (for example the open) once you're outside your cwd?
I think the return statement is misplaced: it returns after the first txt-file?
Python offers readymade solutions for walking through paths recursively: os.walk(), glob.glob() and Path.rglob(): Why don't you use them?
Path.absolute() isn't documented, I wouldn't use it. You could use Path.resolve() instead?
You do nothing with the returned occurences in the recursion step: I think you should update the main dictionary after retrieving it?
Don't use list as a variable name - you're overriding access to the built-in list().

Here's a suggestion with Path.rglob():

from pathlib import Path

def searchEngine(folder, word):
    occurences = {}
    for file in Path(folder).rglob('*.txt'):
        key = str(file)
        with file.open('rt') as stream:
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key]  = count
    return occurences

If you want to implement the recursion for yourself, then you could do something like:

def searchEngine(folder, word) : 
    base = Path(folder)
    occurences = {}
    if base.is_dir():
        for path in base.iterdir():
            occurences.update(searchEngine(path, word))
    elif base.suffix == '.txt':
        with base.open('rt') as stream:
            key = str(base)
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key]  = count            
    return occurences