I'm designing a tool that should only pick up EXR image files from the input folder that follow the naming convention: u#_v#.exr or u#v##.exr (where # denotes whole numbers or positive non-zero integers). All other files should be ignored. My working code is given below. However, is there a better or more efficient way to do this?
def main():
# Add and read command line arguments
parser = argparse.ArgumentParser()
parser.add_argument('--input_folder', type=str, help='Directory where input images are located')
parser.add_argument('--output_folder', type=str, help='Directory where output image should be written')
args = parser.parse_args()
# Change directory to input folder and check all filenames belonging to our convention
os.chdir(args.input_folder)
all_files = check_combinatons_of_numeric_characters('u_v.exr', 'u_v_.exr')
print(all_files)
def check_combinatons_of_numeric_characters(convention1, convention2):
# Combinations for first convention which was supplied
split_convention1 = convention1.split('_')
convention1_combinations_alpha = np.array([])
convention1_combination1 = glob.glob(split_convention1[0] '[0-9]_'
split_convention1[1].split('.')[0] '[0-9].'
split_convention1[1].split('.')[1]
)
convention1_combination2 = glob.glob(split_convention1[0] '[0-9][0-9]_'
split_convention1[1].split('.')[0] '[0-9][0-9].'
split_convention1[1].split('.')[1]
)
convention1_combination3 = glob.glob(split_convention1[0] '[0-9][0-9]_'
split_convention1[1].split('.')[0] '[0-9].'
split_convention1[1].split('.')[1]
)
convention1_combination4 = glob.glob(split_convention1[0] '[0-9]_'
split_convention1[1].split('.')[0] '[0-9][0-9].'
split_convention1[1].split('.')[1]
)
convention1_combinations_alpha = np.concatenate((convention1_combination1,
convention1_combination2,
convention1_combination3,
convention1_combination4),
)
# Combinations for second convention supplied
split_convention2 = convention2.split('_')
convention2_combinations_alpha = np.array([])
convention2_combination1 = glob.glob(split_convention2[0] '[0-9]_'
split_convention2[1] '[0-9]_[0-9]'
split_convention2[2]
)
convention2_combination2 = glob.glob(split_convention2[0] '[0-9][0-9]_'
split_convention2[1] '[0-9]_[0-9]'
split_convention2[2]
)
convention2_combination3 = glob.glob(split_convention2[0] '[0-9]_'
split_convention2[1] '[0-9]_[0-9][0-9]'
split_convention2[2]
)
convention2_combination4 = glob.glob(split_convention2[0] '[0-9][0-9]_'
split_convention2[1] '[0-9][0-9]_[0-9]'
split_convention2[2]
)
convention2_combination5 = glob.glob(split_convention2[0] '[0-9]_'
split_convention2[1] '[0-9][0-9]_[0-9][0-9]'
split_convention2[2]
)
convention2_combination6 = glob.glob(split_convention2[0] '[0-9][0-9]_'
split_convention2[1] '[0-9]_[0-9][0-9]'
split_convention2[2]
)
convention2_combinations_alpha = np.concatenate((convention2_combination1,
convention2_combination2,
convention2_combination3,
convention2_combination4,
convention2_combination5,
convention2_combination6),
)
list_of_files = np.concatenate((convention1_combinations_alpha, convention2_combinations_alpha))
return list_of_files
if __name__ == '__main__':
main()
CodePudding user response:
I would simply match all *.exr
files and then skip the ones which don't follow the pattern.
import glob
import re
list_of_files = [file for file in glob.glob('*.exr')
if re.match(r'^u\d{1,2}_v\d{1,2}\.exr$', file)]
The regex will need to use (?!0)\d{1,2}
instead of \d{1,2}
(in both places) if you strictly need to exclude zeros, too; or (?!0\D)\d{1,2}
if you want to permit leading zeros but not a zero followed by a non-digit.
In some more detail, \d
matches a digit, {1,2}
says between one and two occurrences of the previous expression, and \D
matches a character which is not a digit. (?!something)
is a negative lookahead which prevents a match if the text at this point matches the regular expression something
. \.
matches a literal dot, ^
matches the beginning of the file name, and $
the end; most other characters simply match themselves. For a more detailed exposition, review the documentation for the Python re
module and/or the beginner resources on the Stack Overflow regex
tag info page.
Convert the resulting list to a data frame at your leisure.