Home > Software engineering >  Sort List of File Names with Different Amount of Characters
Sort List of File Names with Different Amount of Characters

Time:07-12

I'm running into a problem where I need a sorted list of files that are in the format xxx_00000, xxx_00001. The issue is that when there are more than 100000 files the format becomes xxx_100000 while all the others stay the same. This means that when I do os.listdir(directory) I get xxx_10000 next to xxx_100000 (i.e. xxx_10000 is index 10,000 and xxx_100000 is index 10,001). Any ideas on how to sort this so that they appear in the correct order? I've tried:

sorted(paths)

sorted(paths, key=lambda x: x[x.rfind('_') 1:-4])

and

def sorted_helper(x):
    x = str(00000)   x[x.rfind('_') 1:-4]
    return x[-7:]

sorted(paths, key=sorted_helper)

CodePudding user response:

You presumably want to sort it as an int, not as a string. Try:

sorted(paths, key=lambda filename: int(filename.split("_")[1]))

CodePudding user response:

You can use natsort.natsorted:

from natsort import natsorted

natsorted(paths)

CodePudding user response:

The natsort library can be helpful in such cases.

For example:

from natsort import natsorted

natsorted(paths)

Output:

['xxx_00001',
 'xxx_00002',
 'xxx_00100',
 'xxx_01000',
 'xxx_10000',
 'xxx_100001',
 'xxx_100010',
 'xxx_110000']
  • Related