I have a unsorted csv file which totally contains 6 fields. There is a duplicate in field1. Need to keep the last occurance of field1 and delete the other duplicate records for that same field1 value. I have tried awk -F',' '!seen[$1] ' - but this keep first occurance and deleting the other occurance.

can anyone help me with other options?

Sample data:

17710813,24759,
17722388,47281,,,,
17722388,1999084,0246,car,28-Jul-11,
17722388,1159769,11301,earn,16-Jun-16,
17722388,136787,35451,dress,,15-Jun-16
17732315,242393,,light,28-Aug-05,21-Jul-08

Expected output

17710813,24759,
17722388,136787,35451,dress,,15-Jun-16
17732315,242393,,light,28-Aug-05,21-Jul-08

CodePudding user response：

With your shown samples please try following awk code this will take care of the sequence in which $1 is occurring in output also.

awk -F',' '
!arr1[$1]  {
  arr2[  count]=$1
}
{
  arr3[$1]=$0
}
END{
  for(i=1;i<=count;i  ){
    print arr3[arr2[i]]
  }
}
' Input_file

OR if $1's sequence as per Input_file doesn't matter for you then try following code.

awk -F',' '{arr[$1]=$0} END{for(i in arr){print arr[i]}}' Input_file

CodePudding user response：

As you have tagged your question unix I presume that command other that awk are allowed and thus you might use tac as follows, let file.txt content be

17710813,24759,
17722388,47281,,,,
17722388,1999084,0246,car,28-Jul-11,
17722388,1159769,11301,earn,16-Jun-16,
17722388,136787,35451,dress,,15-Jun-16
17732315,242393,,light,28-Aug-05,21-Jul-08

then

tac file.txt | awk -F',' '!seen[$1]  ' | tac

gives output

17710813,24759,
17722388,136787,35451,dress,,15-Jun-16
17732315,242393,,light,28-Aug-05,21-Jul-08

Explanation: tac does

concatenate and print files in reverse

therefore last line of file will appear as first and so on. Akin to GNU AWK it will read listed file(s) as in tac file.txt and when not given any file then consume standard input as seen in last part of code above. awk part is copied verbatim from your question.

(tested in gawk 4.2.1)