awk - print the last group from a sorted file-CodePudding

I want to print the last group from the below file with first field as the key.

Input:

62,2010-06-19,27.40
62,2010-06-20,35.40
62,2010-06-21,8.50
63,2010-06-19,56.40
63,2010-06-20,23.76
63,2010-06-21,12.50
63,2010-06-22,87.12
64,2010-06-19,87.40
64,2010-06-20,32.40
64,2010-06-21,21.50
64,2010-06-22,73.40

Required Output:

64,2010-06-19,87.40
64,2010-06-20,32.40
64,2010-06-21,21.50
64,2010-06-22,73.40

I tried with

awk -F, ' { p=NR==1?$1:p; a[NR]=$0 }  p!=$1 { delete a; p=$1 } END { for(i in a) print a[i] }  '

but it is missing one line.

CodePudding user response：

The most efficient (and brief) way would be:

$ tac file | awk -F',' '(NR>1) && ($1!=p){exit} {print; p=$1}' | tac
64,2010-06-19,87.40
64,2010-06-20,32.40
64,2010-06-21,21.50
64,2010-06-22,73.40

or if you don't have tac:

$ awk -F',' '$1!=p{rec=""; p=$1} {rec=rec $0 ORS} END{printf "%s", rec}' file
64,2010-06-19,87.40
64,2010-06-20,32.40
64,2010-06-21,21.50
64,2010-06-22,73.40

or if you prefer to store the last record in an array rather than a string for some reason:

$ awk -F',' '$1!=p{n=0; p=$1} {rec[  n]=$0} END{for (i=1; i<=n; i  ) print rec[i]}' file
64,2010-06-19,87.40
64,2010-06-20,32.40
64,2010-06-21,21.50
64,2010-06-22,73.40

FYI the for(i in a) in your script would have shuffled the order of the lines so the output order wouldn't be the same as the input order (unless by coincidence).

Also, regarding p=NR==1?$1:p - ternary expressions are always easier to read if you enclose them in parentheses and they can lead to syntax errors in some awks in some contexts when you don't so just always parenthesise them, e.g. p=(NR==1?$1:p).

CodePudding user response：

An alternative tac awk tac solution without use of arrays:

tac file | awk -F, 'p && $1 != p{exit} {p = $1} 1' | tac

64,2010-06-19,87.40
64,2010-06-20,32.40
64,2010-06-21,21.50
64,2010-06-22,73.40