How to divide a large image dataset into groups of pictures and save them inside subfolders using py-CodePudding

I have an image dataset that looks like this: Dataset

The timestep of each image is 15 minutes (as you can see, the timestamp is in the filename).

Now I would like to group those images in 3hrs long sequences and save those sequences inside subfolders that would contain respectively 12 images(=3hrs). The result would ideally look like this: Sequences

I have tried using os.walk and loop inside the folder where the image dataset is saved, then I created a dataframe using pandas because I thought I could handle the files more easily but I think I am totally off target here.

CodePudding user response：

The timestep of each image is 15 minutes (as you can see, the timestamp is in the filename).

Now I would like to group those images in 3hrs long sequences and save those sequences inside subfolders that would contain respectively 12 images(=3hrs)

I suggest exploiting datetime built-in libary to get desired result, for each file you have

get substring which is holding timestamp
parse it into datetime.datetime instance using datetime.datetime.strptime
convert said instance into seconds since epoch using .timestamp method
compute number of seconds integer division (//) 10800 (number of seconds inside 3hr)
convert value you got into str and use it as target subfolder name

CodePudding user response：

Since you said you need only 12 files (considering that the timestamp is the same for all of them and 12 is the exact number you need, the following code can help you

import os
import shutil
output_location = "location where you want to save them" # better not to be in the same location with the dataset
dataset_path = "your data set"
files = [os.path.join(path, file) for path, subdirs, files in os.walk(dataset_path) for file in files]

nr_of_files = 0
folder_name = ""
for index in range(len(files)):
    if nr_of_files == 0:
        folder_name = os.path.join(output_location, files[index].split("\\")[-1].split(".")[0])
        os.mkdir(folder_name)
        shutil.copy(files[index], files[index].replace(dataset_path, folder_name))
        nr_of_files  = 1
    elif nr_of_files == 11:
        shutil.copy(files[index], files[index].replace(dataset_path, folder_name))
        nr_of_files = 0
    else:
        shutil.copy(files[index], files[index].replace(dataset_path, folder_name))
        nr_of_files  = 1

Explaining the code:
files takes value of all files in the dataset_path. You set this variable and files will contain the entire path to all files.

for loop interating for the entire length of files.

Used nr_of_files to count each 12 files. If it's 0, it will create a folder with the name of files[index] to the location you set as output, will copy the file (replacing the input path with the output path)

If it's 11 (starting from 0, index == 11 means 12th file) will copy the file and set nr_of_files back to 0 to create another folder

Last else will simply copy the file and increment nr_of_files