Home > Mobile >  read file in a folder with their original order in python
read file in a folder with their original order in python

Time:09-23

I have some files in a folder that are named from 1 to 15. when I use the following code to read all files in a folder, it first read a file with name 10 to 15 and then 1 to 9. but I need to read files in the original order. what should I do to read files in original order from 1 to 15?

os.chdir(dir_main) #change directory to downloads folder
files_path = [os.path.abspath(x) for x in os.listdir()]
fnames_main = [x for x in files_path if x.endswith(".mkv")]

my complete code is:

def split_vidoe(dir_main,dir_res):
    os.chdir(dir_main) #change directory to downloads folder
    files_path = [os.path.abspath(x) for x in os.listdir()]
    # fnames_main = [x for x in files_path if x.endswith(".mkv")]
    fnames_main = sorted(list(Path(dir_main).iterdir()),
                     key=lambda path: int(path.stem))
    print(len(fnames_main)) 
       
    for i in range(len(fnames_main)):  
        pname1 = (fnames_main[i])
        pname2 = dir_res str(i 1) '_'
        
        subprocess.call(['ffmpeg.exe','-i', pname1,'-c:v','libx264','-pix_fmt' ,'yuv420p','-b:v', '8000K', '-bufsize', '8000K' ,'-minrate', '8000K','-maxrate', '8000K', '-x264opts','keyint=60:min-keyint=60','-preset', 'veryfast', '-profile:v', 'high', '-f', 'hls', '-hls_time', '2', '-hls_list_size','0', pname2])
    return
        
current_path=os.path.abspath(os.getcwd());
current_dir = Path.cwd()
all_sub_dir_paths = glob(str(current_dir)   '/*/') # returns list of sub directory paths
all_sub_dir_names = [Path(sub_dir).name for sub_dir in all_sub_dir_paths] 
for i in range(len(all_sub_dir_names)):
    dir_main=current_path '\\' all_sub_dir_names[i]
    dir_res=current_path '\\' all_sub_dir_names[i] '\\' 'gop_split' '\\'
    split_vidoe(dir_main,dir_res)

CodePudding user response:

To expand on @cards' answer: If you are using python 3.5 or later, you can use the pathlib module.

from pathlib import Path

fnames_main = sorted(list(Path(dir_main).iterdir()),
                     key=lambda path: int(path.stem))

sorted's key keyword arg can take a filter method. In this case, we use pathlib's .stem attribute on the Path object to get just the filename without the ".mkv" suffix, and sort the list in order by int.

CodePudding user response:

the problem is that string ordering is lexicographic, so you need to extract the number part from the name and transformed it to regular int so you can order it by its numeric value, that can be done manually if its just removing the extension or with the re module if its some more complicate thing to extract the numeric part of the name.

Here is a sample with the re module

>>> import re
>>> names=sorted(f"video{n}.mkv" for n in range(1,16)) #sample list of names with numbers
>>> names
['video1.mkv', 'video10.mkv', 'video11.mkv', 'video12.mkv', 'video13.mkv', 'video14.mkv', 'video15.mkv', 'video2.mkv', 'video3.mkv', 'video4.mkv', 'video5.mkv', 'video6.mkv', 'video7.mkv', 'video8.mkv', 'video9.mkv']
>>> sorted(names,key=lambda x:int(re.search("([0-9]) ",x).group()))
['video1.mkv', 'video2.mkv', 'video3.mkv', 'video4.mkv', 'video5.mkv', 'video6.mkv', 'video7.mkv', 'video8.mkv', 'video9.mkv', 'video10.mkv', 'video11.mkv', 'video12.mkv', 'video13.mkv', 'video14.mkv', 'video15.mkv']
>>> 

One I personally use, is this variation on the same idea

>>> def sort_key_num(name_file:str, str_key=str.lower, nzero:int=3) -> str:
        return str_key( re.sub("([0-9] )",lambda x: x.group(0).zfill(nzero),name_file) )
>>> sort_key_num("video1.mkv")
'video001.mkv'
>>> sort_key_num("video.mkv")
'video.mkv'
>>> sort_key_num("video 1 - 2.mkv")
'video 001 - 002.mkv'
>>> 
>>> sorted(names, key=sort_key_num)
['video1.mkv', 'video2.mkv', 'video3.mkv', 'video4.mkv', 'video5.mkv', 'video6.mkv', 'video7.mkv', 'video8.mkv', 'video9.mkv', 'video10.mkv', 'video11.mkv', 'video12.mkv', 'video13.mkv', 'video14.mkv', 'video15.mkv']
>>> 

the sort_key_num function here append zeroes to any number (if any) in the string given so the lexicographical order is as expected with the numbers value (and also apply lower to it for good measure )

>>> test=["E:\\UGC\\Animation\\9.mkv", "E:\\UGC\\Animation\\1.mkv", "E:\\UGC\\Animation\\10.mkv"]
>>> sorted(test)
['E:\\UGC\\Animation\\1.mkv', 'E:\\UGC\\Animation\\10.mkv', 'E:\\UGC\\Animation\\9.mkv']
>>> sorted(test, key=sort_key_num)
['E:\\UGC\\Animation\\1.mkv', 'E:\\UGC\\Animation\\9.mkv', 'E:\\UGC\\Animation\\10.mkv']
>>>  

CodePudding user response:

The strings follow the lexicographic order so you need to cast to integer. To get rid of the file extension you can use os.path.splitext

  • with sort: return a generator
fnames_main = sorted([x for x in files_path if x.endswith(".mkv")], key=lambda path: int(os.path.splitext(os.path.basename(path))[0])])
  • with list.sort: works directly on the memory address, return none
fnames_main = [x for x in files_path if x.endswith(".mkv")]
fnames_main.sort( key=lambda path: int(os.path.splitext(os.basename(path))[0]))

Remark: to get rid of any path dependencies such as \, . call first basename and then split per file extension with splitext

  • Related