I have a series of lines with the following syntax:
headword apple:11 banana:9 cherry:101 donut:1 egg tart:86
(the large spaces are all tabs)
Desired output is columns 2 sorted numerically by the number after the colons. e.g.
headword cherry:101 egg tart:86 apple:11 banana:9 donut:1
I often use ruby one-liners..
//alphabetize within a line, delimited by pipes "|"
ruby -pe '$_=$_.strip.split("|").sort().join("|") "\n"'
//case insensitive with no dupes:
ruby -pe '$_=$_.strip.split("|").sort_by{|x| x.downcase }.uniq.join("|") "\n"'
//keep the first term:
ruby -pe '$_=$_.split(":")[0].strip ":" $_.split(":")[1].strip.split("|").sort.join("|") "\n"'
But I can't quite wrap my brain around a simple and clean way to sort by the trailing number. i.e. the ":NN". I'm sure this can be done with a few characters. How? I'm also happy for an awk solution, but ruby is often cleaner for more complex processing.
CodePudding user response:
Assuming a
is the result of splitting each line on \t
characters.
irb(main):009:0> "#{a[0]}\t#{a[1..].sort { |a, b| b.split(":")[1].to_i <=> a.split(":")[1].to_i }.join("\t")}"
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
Each line is split on tabs. This gives us an array:
["headword", "apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
We can leave the first element alone. We can then sort the remaining elements by splitting them into key/value pairs and comparing the second element of each. If we compare b
to a
we get descending order.
ruby -pe 'a=$_.split("\t");puts "#{a[0]}\t#{a[1..].sort{|a,b|b.split(":")[1].to_i<=>a.split(":")[1].to_i}.join("\t")}"'
CodePudding user response:
> str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
=> "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
> str.split("\t")[1..-1].sort_by { |x| x.split(':')[-1].to_i }.reverse.prepend(str.split("\t")[0]).join("\t")
=> "headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"
CodePudding user response:
str = "headword\tapple:11\tbanana:9\tcherry:101\tdonut:1\tegg tart:86"
hw, *arr = str.split("\t")
hw
#=> "headword"
arr
#=> ["apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
[hw, *arr.sort_by { |s| -s[/(?<=:)\d /].to_i }].join("\t")
#=>"headword\tcherry:101\tegg tart:86\tapple:11\tbanana:9\tdonut:1"