Home > database >  How to retrieve file names from subfolders
How to retrieve file names from subfolders

Time:11-22

I have the following folder structure:

root
│   file001.docx
│   file002.docx    
│
└───folder1
   │   file003.docx
   │   file004.docx
   │
   └───subfolder1
       │   file005.docx
       │   file006.docx
       |____subfolder2
            |
            |_file007.docx
   

I wish to create a program where when someone types their root directory and a keyword, the file will show up. for example: if I input "hello there!", file007.docx will show up (assume the text "hello there!" is contained in file007.docx ) and let the user know the typed words is in the word doc.

To approach this, I made a list of all the word documents inside the folders and sub folders by using this code:

def find_doc():
    variable= input('What is your directory?') #asking for root directory
    os.chdir(variable)
    files = []
    for dirpath, dirnames, filenames in os.walk(variable):
        for filename in [f for f in filenames if f.endswith(".docx")]:
            files.append(filename)  
    return files

Now, this is the second code for finding the contents in each word document:

all_files= find_doc() # just calling the first function I just made

while True: 
    keyword= input('Input your word or type in Terminate to exit: ')
    for i in range(len(all_files)): 
        text = docx2txt.process(all_files[i]) 
        if keyword.lower() in text.lower():  #to make it case insensitive
            print ((all_files[i]))    
    if keyword== ('Terminate') or keyword== ('terminate'):
        break

Theoretically, If I inputted the word "hello", within the input: input('Input your word or type in Terminate to exit: '), I should be able to retrieve file007.docx because all_files= find_doc() output

['file001.docx',
'file002.docx',
'file003.docx',
'file004.docx',
'file005.docx',
'file006.docx',
'file007.docx',]

Due to os.walk()'s recursive nature.

However, it threw me an error: FileNotFoundError: [Errno 2] No such file or directory:

I was wondering where I went wrong? Thanks!

CodePudding user response:

I think you want to modify your function into something like this to store the filenames with their associated path.

def find_doc():
    variable= input('What is your directory?') #asking for root directory
    os.chdir(variable)
    files = []
    for dirpath, dirnames, filenames in os.walk(variable):
        for filename in [f for f in filenames if f.endswith(".docx")]:
            files.append(os.path.join(dirpath, filename))
    return files

You should also change your while loop so that your if statement gets checked prior to running the for loop.

while True: 
    keyword= input('Input your word or type in Terminate to exit: ')
    if keyword.lower() == 'terminate':
        break
    else:   
        for i in range(len(all_files)): 
            text = docx2txt.process(all_files[i]) 
            if keyword.lower() in text.lower():  #to make it case insensitive
                print ((all_files[i]))  
  • Related