I have a unsorted csv file which totally contains 6 fields. There is a duplicate in field1. Need to keep the last occurance of field1 and delete the other duplicate records for that same field1 value. I have tried awk -F',' '!seen[$1] ' - but this keep first occurance and deleting the other occurance.
can anyone help me with other options?
Sample data:
17710813,24759,
17722388,47281,,,,
17722388,1999084,0246,car,28-Jul-11,
17722388,1159769,11301,earn,16-Jun-16,
17722388,136787,35451,dress,,15-Jun-16
17732315,242393,,light,28-Aug-05,21-Jul-08
Expected output
17710813,24759,
17722388,136787,35451,dress,,15-Jun-16
17732315,242393,,light,28-Aug-05,21-Jul-08
CodePudding user response:
With your shown samples please try following awk
code this will take care of the sequence in which $1
is occurring in output also.
awk -F',' '
!arr1[$1] {
arr2[ count]=$1
}
{
arr3[$1]=$0
}
END{
for(i=1;i<=count;i ){
print arr3[arr2[i]]
}
}
' Input_file
OR if $1's sequence as per Input_file doesn't matter for you then try following code.
awk -F',' '{arr[$1]=$0} END{for(i in arr){print arr[i]}}' Input_file
CodePudding user response:
As you have tagged your question unix
I presume that command other that awk
are allowed and thus you might use tac
as follows, let file.txt
content be
17710813,24759,
17722388,47281,,,,
17722388,1999084,0246,car,28-Jul-11,
17722388,1159769,11301,earn,16-Jun-16,
17722388,136787,35451,dress,,15-Jun-16
17732315,242393,,light,28-Aug-05,21-Jul-08
then
tac file.txt | awk -F',' '!seen[$1] ' | tac
gives output
17710813,24759,
17722388,136787,35451,dress,,15-Jun-16
17732315,242393,,light,28-Aug-05,21-Jul-08
Explanation: tac
does
concatenate and print files in reverse
therefore last line of file will appear as first and so on. Akin to GNU AWK
it will read listed file(s) as in tac file.txt
and when not given any file then consume standard input as seen in last part of code above. awk
part is copied verbatim from your question.
(tested in gawk 4.2.1)