How can i reorder a list by the earliest date in the second column when the first column is the same-CodePudding

I have lots of lists like the following, the first column is an ID number, the second column is a score and the third column is date of birth in DDMMYYYY format.

111 100 01012011
222 90 01012001
333 90 01012013
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018

When there are multiple lines with the same score, i would like to reorder them with the newest date at the top, the result for the example would be:

111 100 01012011
333 90 01012013
222 90 01012001
555 80 01012014
444 80 01012015
666 70 01012016
777 60 01012017
888 50 01012018

As you can see the lines with same scores have been re-arranged with the newest date at the top.

I have first tried to select the earliest date, i can do this with the following:

 sort -k1.5 -k1.1,1.2 -k1.3,1.4 | tail -n 1

But i am unsure of how i can acheive the outcome. How can i acheive the result?

CodePudding user response：

The issue with the current sort attempt is that you're trying to parse the 1st field (-k1.) when what you want is to parse the 3rd field (-k3.).

Setup, adding a few entries with dates other than DDMM == 0101:

$ cat raw.dat
111 100 01012011
222 90 01012001
333 90 01012013
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
aaa 35 01082022
bbb 35 23012022
ccc 35 12112022
ddd 35 10122022

One approach, assuming the first sort is for the 2nd field (score) in descending numeric order:

$ sort -t' ' -k2,2nr -k3.5,3.8nr -k3.3,3.4nr -k3.1,3.2nr raw.dat
111 100 01012011
333 90 01012013
222 90 01012001
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
ddd 35 10122022
ccc 35 12112022
aaa 35 01082022
bbb 35 23012022

Where:

-t ' ' - define delimiter as space (overrides default which is the non-blank to blank transition, which would cause leading space(s) to be counted as part of the field)
-k2,2nr - 1st sort (score), start with field 2, end with field 2, sort as numeric and in reverse (aka descending) order
-k3.5,3.8nr - 2nd sort (YYYY), start with field 3 and 5th character, end with field 3 and 8th character, sort as numeric and in reverse order (-k3.3,3.4nr and -k3.1,3.2nr perform reverse numeric sorts on MM and DD, respectively)
NOTE: OP's expected output shows ID=555 (YYYY=2014) listed before ID=444 (YYYY=2015); I'm assuming this is a typo and that ID=444 should be listed first

CodePudding user response：

Using sort

sort -k2 -rn input_file
111 100 01012011
333 90 01012013
222 90 01012001
555 80 01012014
444 80 01012015
666 70 01012016
777 60 01012017
888 50 01012018

CodePudding user response：

Using gawk:

gawk '{split($3, a, ""); print $1,$2,a[5]a[6]a[7]a[8]a[3]a[4]a[1]a[2]}' score-file |
sort -t ' ' -rnk 2,2 -k 3,3 |
gawk '{split($3, a, ""); print $1,$2,a[7]a[8]a[5]a[6]a[1]a[2]a[3]a[4]}'

Split the date in to characters to arrange in largest - smallest units (ID SCORE YYYYMMDD).
This is a format that can be sorted by score then date, with sort.
Specifying multiple -k flags to sort sorts by those fields, in the order the flags are given (ie sort by field 2, then by field 3).
-n = numeric sort, -r = reverse order (high - low), -t = field separator (space).
The second awk rearranges to original format.

If you can get/use input format of ID SCORE YYYYMMDD you can trivially sort it with the same sort command (without gawk).

Edit: If you want scores high - low, and dates low - high (old - new), ie. different orders, you can use sort -t ' ' -k 2,2rn -k 3,3n in place of the sort command above. sort can specify different flags for each key.