Home > Software design >  Sorting a space delimited list with uneven spaces
Sorting a space delimited list with uneven spaces

Time:11-05

I have a space delimited list that has an uneven amount of spaces in what would be the first column. I want to reverse sort this by the first number that appears after its string. I need to do this using bash commands.

Example:

Pontiac Firebird 19.0 6 250.0 100.0 3282. 15.0 71 US
Pontiac J2000 SE Hatchback 31.0 4 112.0 85.00 2575. 16.2 82 US
Oldsmobile Delta 88 Royale 12.0 8 350.0 160.0 4456. 13.5 72 US
Oldsmobile Omega 11.0 8 350.0 180.0 3664. 11.0 73 US
AMC Gremlin 20.0 6 232.0 100.0 2914. 16.0 75 US
AMC Gremlin 21.0 6 199.0 90.00 2648. 15.0 70 US
Pontiac Lemans V6 21.5 6 231.0 115.0 3245. 15.4 79 US

Would turn into:


Oldsmobile Omega 11.0 8 350.0 180.0 3664. 11.0 73 US
Oldsmobile Delta 88 Royale 12.0 8 350.0 160.0 4456. 13.5 72 US
Pontiac Firebird 19.0 6 250.0 100.0 3282. 15.0 71 US
AMC Gremlin 20.0 6 232.0 100.0 2914. 16.0 75 US
AMC Gremlin 21.0 6 199.0 90.00 2648. 15.0 70 US
Pontiac Lemans V6 21.5 6 231.0 115.0 3245. 15.4 79 US
Pontiac J2000 SE Hatchback 31.0 4 112.0 85.00 2575. 16.2 82 US

I've tried doing sort -nr to see what happens and it reverse sorts the list, but respective to it's alphabetized order. I want to sort based on all values.

The trick is that I must keep it space delimited. What's the best way to do this using bash?

CodePudding user response:

I must keep it space delimited

You mean, the result has to be space delimited again, right? During processing, you can transform the input however you like.

Assuming you know a character that never appears in your file otherwise, delimit the value you want to sort with by that character using sed, then sort by that value, then remove the additional delimiters again. (This process is basically a Schwartzian transform.)

Here we use the bell character \a to delimit the key for sorting. It is very unlikely that that character is in a text file.

sed -E 's/ ([0-9] \.[0-9] ) / \a\1\a /' | sort -t $'\a' -k2,2n | tr -d \\a

CodePudding user response:

here's a short ruby program:

ruby -e '
    puts IO.readlines(ARGV.shift, chomp: true)
        .map {|line|
            fields = line.split
            [fields[0..(fields.size - 9)].join(" ")]   fields[-8 .. -1]
        }
        .sort_by {|row| row[1]}
        .map {|row| row.join(" ")}
        .join("\n")
' file

CodePudding user response:

I would use GNU AWK for this as follows, let file.txt content be

Pontiac Firebird 19.0 6 250.0 100.0 3282. 15.0 71 US
Pontiac J2000 SE Hatchback 31.0 4 112.0 85.00 2575. 16.2 82 US
Oldsmobile Delta 88 Royale 12.0 8 350.0 160.0 4456. 13.5 72 US
Oldsmobile Omega 11.0 8 350.0 180.0 3664. 11.0 73 US
AMC Gremlin 20.0 6 232.0 100.0 2914. 16.0 75 US
AMC Gremlin 21.0 6 199.0 90.00 2648. 15.0 70 US
Pontiac Lemans V6 21.5 6 231.0 115.0 3245. 15.4 79 US

then

awk 'BEGIN{FPAT="[0-9]*[.][0-9]*";PROCINFO["sorted_in"]="@ind_num_asc"}{arr[$1]=$0}END{for(i in arr){print arr[i]}}' file.txt

output

Oldsmobile Omega 11.0 8 350.0 180.0 3664. 11.0 73 US
Oldsmobile Delta 88 Royale 12.0 8 350.0 160.0 4456. 13.5 72 US
Pontiac Firebird 19.0 6 250.0 100.0 3282. 15.0 71 US
AMC Gremlin 20.0 6 232.0 100.0 2914. 16.0 75 US
AMC Gremlin 21.0 6 199.0 90.00 2648. 15.0 70 US
Pontiac Lemans V6 21.5 6 231.0 115.0 3245. 15.4 79 US
Pontiac J2000 SE Hatchback 31.0 4 112.0 85.00 2575. 16.2 82 US

Explanation: I inform GNU AWK that field is 0 or more digits followed by literal dot ([.]) followed by 0 or more digits (note: I assume that there will always be dot in first number and never dot in column with name) and that array traversal should be treat-indices-as-numbers-ascending which is one of Predefined Array Scanning Orders. For each line I add to array pair with key being first number ($1) and value being whole line ($0). After going through all lines I print values from array arr with order which observe selected array traversal.

(tested in gawk 4.2.1)

  • Related