Home > Mobile >  How do you do natural ordering for strings with negative numbers and characters?
How do you do natural ordering for strings with negative numbers and characters?

Time:10-22

I have a list of file names that I need to open, however, the file names contain a hyphen "-" which neither os or Windows 10 naturally recognizes as a negative number. As an example, the list itself gets imported as

['-001.000ps.csv',
 '-100.000ps.csv',
 '0000.000ps.csv',
 '0001.000ps.csv',
 '0002.000ps.csv',
 '0003.000ps.csv',
 '0003.500ps.csv',
]

where -1 preceeds -100. The positions of -1 and -100 needs to be reversed and I need to preserve the leading zeros and "ps.csv" component because the files are named this way.

I attempted some solutions I found on this stackexchange, however, what most people wanted dealt with searching for positive numbers and ordering off of that. For the natsort package, what happens is that -1 and -100 are put at the bottom of the list.

Converting these strings to ints or floats fails, I guess because ps.csv is inside of the element.

I copy pasted the solution from the blogpost referenced here and the same issue occurs. I feel like I'm missing something obvious here, why are the negative numbers not working?

CodePudding user response:

You can sort the list based on the first 6 characters of names:

a = ['-001.000ps.csv',
     '-100.000ps.csv',
     '0000.000ps.csv',
     '0001.000ps.csv',
     '0002.000ps.csv',
     '0003.000ps.csv',
     '0003.500ps.csv',
]

a.sort(key=lambda x: float(x[:6]))

Result:

['-100.000ps.csv',
 '-001.000ps.csv',
 '0000.000ps.csv',
 '0001.000ps.csv',
 '0002.000ps.csv',
 '0003.000ps.csv',
 '0003.500ps.csv'
]

CodePudding user response:

natsort can be used to sort signed numbers like so:

fnames = [
    '-001.000ps.csv',
    '-100.000ps.csv',
    '0000.000ps.csv',
    '0001.000ps.csv',
    '0002.000ps.csv',
    '0003.000ps.csv',
    '0003.500ps.csv',
]

natsort.realsorted(fnames)
natsort.natsorted(fnames, alg=natsort.ns.REAL)

Both of these produce the same output:

['-100.000ps.csv',
 '-001.000ps.csv',
 '0000.000ps.csv',
 '0001.000ps.csv',
 '0002.000ps.csv',
 '0003.000ps.csv',
 '0003.500ps.csv']

CodePudding user response:

Try:

lst = [
    "-001.000ps.csv",
    "-100.000ps.csv",
    "0000.000ps.csv",
    "0001.000ps.csv",
    "0002.000ps.csv",
    "0003.000ps.csv",
    "0003.500ps.csv",
]

import re

pat = re.compile(r"^(-?\d \.?\d*)(.*)")

out = sorted(
    lst, key=lambda x: (float((v := pat.search(x)).group(1)), v.group(2))
)
print(out)

Prints:

[
    "-100.000ps.csv",
    "-001.000ps.csv",
    "0000.000ps.csv",
    "0001.000ps.csv",
    "0002.000ps.csv",
    "0003.000ps.csv",
    "0003.500ps.csv",
]
  • Related