Home > Software engineering >  One-liner to sort by trailing number suffix
One-liner to sort by trailing number suffix

Time:03-28

I have a series of lines with the following syntax:

headword   apple:11   banana:9   cherry:101   donut:1   egg tart:86 
(the large spaces are all tabs)

Desired output is columns 2 sorted numerically by the number after the colons. e.g.

headword   cherry:101   egg tart:86   apple:11   banana:9   donut:1 

I often use ruby one-liners..

//alphabetize within a line, delimited by pipes "|" 
ruby -pe '$_=$_.strip.split("|").sort().join("|") "\n"'

//case insensitive with no dupes:
ruby -pe '$_=$_.strip.split("|").sort_by{|x| x.downcase }.uniq.join("|") "\n"' 

//keep the first term:
ruby -pe '$_=$_.split(":")[0].strip ":" $_.split(":")[1].strip.split("|").sort.join("|") "\n"'

But I can't quite wrap my brain around a simple and clean way to sort by the trailing number. i.e. the ":NN". I'm sure this can be done with a few characters. How? I'm also happy for an awk solution, but ruby is often cleaner for more complex processing.

CodePudding user response:

Assuming a is the result of splitting each line on \t characters.

irb(main):009:0> "#{a[0]}\t#{a[1..].sort { |a, b| b.split(":")[1].to_i <=> a.split(":")[1].to_i }.join("\t")}"
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"

Each line is split on tabs. This gives us an array:

["headword", "apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]

We can leave the first element alone. We can then sort the remaining elements by splitting them into key/value pairs and comparing the second element of each. If we compare b to a we get descending order.

ruby -pe 'a=$_.split("\t");puts "#{a[0]}\t#{a[1..].sort{|a,b|b.split(":")[1].to_i<=>a.split(":")[1].to_i}.join("\t")}"'

CodePudding user response:

> str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
=> "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
> str.split("\t")[1..-1].sort_by { |x| x.split(':')[-1].to_i }.reverse.prepend(str.split("\t")[0]).join("\t")
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"

CodePudding user response:

str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
hw, *arr = str.split("\t")
hw
  #=> "headword"
arr
  #=> ["apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
[hw, *arr.sort_by { |s| -s[/(?<=:)\d /].to_i }].join("\t")
  #=>"headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
  • Related