Home > Blockchain >  How to select folders from a directory based on a python list of the folder names?
How to select folders from a directory based on a python list of the folder names?

Time:02-10

I have a list of folder names - "df_train_pos_list"

I want to iterate through a directory and select folders with those names, and add them to another list - "train_images"

So far what I have tried doesn't work:

train_images = []
train_labels = []

for i in df_train_pos_list:
    for currentpath, folders, files in os.walk('D:\Arm C Deep Learning\SH_OCTAPUS\Train'):
        for file in files:
            if i in currentpath:
                train_images.append('D:\Arm C Deep Learning\SH_OCTAPUS\Train'   file)
                train_labels.append(1)
            else:
                train_images.append('D:\Arm C Deep Learning\SH_OCTAPUS\Train'   file)
                train_labels.append(0)
train_labels = np.asarray(train_labels, dtype=np.int64)
print(train_labels)
np.unique(train_labels, return_counts='TRUE')

CodePudding user response:

Kind of unsure if you want to add folder path to the list or the individual files in the folder to your list but the below snippet will add the folder paths to your trains_list. would need more details on what you want out of the label to add that.

import os
df_train_pos_list =[]
train_images = []
train_labels = []
root = 'D:\Arm C Deep Learning\SH_OCTAPUS\Train'
for f in os.listdir(root):
    if f in df_train_pos_list:
        train_images.append(os.path.join(root,f)) #this will add your folder file path to train images

CodePudding user response:

From what I understood, you are trying to generate 2 lists: one containing all the paths in "D:\Arm C Deep Learning\SH_OCTAPUS\Train" and one containing 0s and 1s depending on whether a path is in df_train_pos_list.

This should do the trick:

from pathlib import Path

df_train_pos_list = []
train_images = []
train_labels = []
df_train_pos_set = set(df_train_pos_list)

for path in Path("D:\Arm C Deep Learning\SH_OCTAPUS\Train").glob("*"):
    train_images.append(path)
    train_labels.append(1 if path in df_train_pos_set else 0)

A couple things to note:

  • pathlib is best practice when dealing with the file system.
  • I'm creating a set from your df_train_pos_list to improve complexity. It will take O(N) time complexity to create the set from the list but it will take O(1) to check whether a path is in the set whereas it would take O(N) using a list.
  • Related