I am trying to process a raw dataset sufolder and save the output images to a destination folder.
So for example:
Raw dataset in video_0001 should be saved in destination directory with the same folder name as video_0001
I tried the following code but since the dataset contain over 200 folders
here's what I was able to come up with
directory = "C:\\Users\\dataset\\distdir"
save_directory1 = "C:\\Users\\Desktop\\dataset\\distdir\\save_img\\Folder1"
save_directory2 = "C:\\Users\\Desktop\\dataset\\distdir\\save_img\\Folder2"
height = 512
width = 512
for root, dirs, files in os.walk(directory):
for folder_name in dirs:
cv2.imwrite(os.path.join(save_directory1, file), img)
cv2.imwrite(os.path.join(save_directory2, file), img)
for file in files:
img = cv2.imread(os.path.join(root,file))
print(img)
img = cv2.resize(img, (height, width))
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.normalize(img, None, 0, 255, cv2.NORM_MINMAX)
print(img.shape)
But the output is that in folder 1 and folder 2 has the last image only and this solution is not feasible as I have over 200 folders.
Any thoughts would be appreciated
CodePudding user response:
You need to combine the two for loops. Otherwise, you'll overwrite the variables img and file again and again and will only save the last processed img.
Try:
directory = "C:\\Users\\dataset\\distdir"
for root, dirs, files in os.walk(directory):
for file in files:
img = cv2.imread(os.path.join(root, file))
img = cv2.resize(img, (height, width))
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.normalize(img, None, 0, 255, cv2.NORM_MINMAX)
folder = os.path.splitext(file)[0]
end_of_path = os.path.join(folder, file)
cv2.imwrite(os.path.join(directory, end_of_path), img)
CodePudding user response:
You can define your preprocess logic first.
- Walk through the origin folder to get the folder name
- Create a destination folder based on the origin folder
- Loop all files inside that origin folder
- Do the image preprocess steps
- Save the image to that created destination folder
For example, this is the tree directory of your origin folder
-dataset
- -folder_0001
- - -image_01.jpeg
- - -image_02.jpeg
- -folder_0002
- - -image_01.jpeg
- - -image_02.jpeg
...
- -folder_0200
- - -image_01.jpeg
- - -image_02.jpeg
import os
import cv2
input_dir = 'dataset'
output_dir = 'output'
height = 512
width = 512
for root, dirs, files in os.walk(input_dir):
for file in files:
# Create output directory
output_dir_path = os.path.join(output_dir, os.path.split(os.path.dirname(os.path.join(root, file)))[-1])
if not os.path.exists(output_dir_path):
os.makedirs(output_dir_path)
# Do preprocessing image
img = cv2.imread(os.path.join(root, file))
img = cv2.resize(img, (height, width))
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.normalize(img, None, 0, 255, cv2.NORM_MINMAX)
# Save it to the output directory
cv2.imwrite(os.path.join(output_dir_path, file), img)
With this solution you able to handle 200 subfolders