I have lots of lists like the following, the first column is an ID number, the second column is a score and the third column is date of birth in DDMMYYYY format.
111 100 01012011
222 90 01012001
333 90 01012013
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
When there are multiple lines with the same score, i would like to reorder them with the newest date at the top, the result for the example would be:
111 100 01012011
333 90 01012013
222 90 01012001
555 80 01012014
444 80 01012015
666 70 01012016
777 60 01012017
888 50 01012018
As you can see the lines with same scores have been re-arranged with the newest date at the top.
I have first tried to select the earliest date, i can do this with the following:
sort -k1.5 -k1.1,1.2 -k1.3,1.4 | tail -n 1
But i am unsure of how i can acheive the outcome. How can i acheive the result?
CodePudding user response:
The issue with the current sort
attempt is that you're trying to parse the 1st field (-k1.
) when what you want is to parse the 3rd field (-k3.
).
Setup, adding a few entries with dates other than DDMM == 0101
:
$ cat raw.dat
111 100 01012011
222 90 01012001
333 90 01012013
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
aaa 35 01082022
bbb 35 23012022
ccc 35 12112022
ddd 35 10122022
One approach, assuming the first sort is for the 2nd field (score
) in descending numeric order:
$ sort -t' ' -k2,2nr -k3.5,3.8nr -k3.3,3.4nr -k3.1,3.2nr raw.dat
111 100 01012011
333 90 01012013
222 90 01012001
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
ddd 35 10122022
ccc 35 12112022
aaa 35 01082022
bbb 35 23012022
Where:
-t ' '
- define delimiter as space (overrides default which is the non-blank to blank transition, which would cause leading space(s) to be counted as part of the field)-k2,2nr
- 1st sort (score
), start with field2
, end with field2
, sort asn
umeric and inr
everse (aka descending) order-k3.5,3.8nr
- 2nd sort (YYYY
), start with field3
and5
th character, end with field3
and8
th character, sort asn
umeric and inr
everse order (-k3.3,3.4nr
and-k3.1,3.2nr
performr
eversen
umeric sorts onMM
andDD
, respectively)- NOTE: OP's expected output shows
ID=555
(YYYY=2014
) listed beforeID=444
(YYYY=2015
); I'm assuming this is a typo and thatID=444
should be listed first
CodePudding user response:
Using sort
sort -k2 -rn input_file
111 100 01012011
333 90 01012013
222 90 01012001
555 80 01012014
444 80 01012015
666 70 01012016
777 60 01012017
888 50 01012018
CodePudding user response:
Using gawk:
gawk '{split($3, a, ""); print $1,$2,a[5]a[6]a[7]a[8]a[3]a[4]a[1]a[2]}' score-file |
sort -t ' ' -rnk 2,2 -k 3,3 |
gawk '{split($3, a, ""); print $1,$2,a[7]a[8]a[5]a[6]a[1]a[2]a[3]a[4]}'
- Split the date in to characters to arrange in largest - smallest units (
ID SCORE YYYYMMDD
). - This is a format that can be sorted by score then date, with
sort
. - Specifying multiple
-k
flags tosort
sorts by those fields, in the order the flags are given (ie sort by field 2, then by field 3). -n
= numeric sort,-r
= reverse order (high - low),-t
= field separator (space).- The second awk rearranges to original format.
If you can get/use input format of ID SCORE YYYYMMDD
you can trivially sort it with the same sort command (without gawk).
Edit: If you want scores high - low, and dates low - high (old - new), ie. different orders, you can use sort -t ' ' -k 2,2rn -k 3,3n
in place of the sort command above. sort
can specify different flags for each key.