Home > Blockchain >  Changing previous duplicate line in awk
Changing previous duplicate line in awk

Time:10-28

I want to change all duplicate names in .csv to unique, but after finding duplicate I cannot reach previous line, because it's already printed. I've tried to save all lines in array and print them in End section, but it doesn't work and I don't understand how to access specific field in this array (two-dimensional array isn't supported in awk?).

sample input

...,9,phone,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone,...

desired output

...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...

My attempt ($2 - id field, $3 - name field)

BEGIN{
       FS=","
       OFS=","
       marker=777
     } 
     {
       if (names[$3] == marker) {
       $3 = $3 $2
       #Attempt to change previous duplicate
       results[nameLines[$3]]=$3 id[$3]
       }
       names[$3] = marker
       id[$3] = $2
       nameLines[$3] = NR
       results[NR] = $0
     }
END{
     #it prints some numbers, not saved lines
     for(result in results)
     print result
   }

CodePudding user response:

Here is single pass awk that stores all records in buffer:

awk -F, '
{
   rec[NR] = $0
     fq[$3]
}
END {
   for (i=1; i<=NR;   i) {
      n = split(rec[i], a, /,/)
      if (fq[a[3]] > 1)
         a[3] = a[3] a[2]
      for (k=1; k<=n;   k)
         printf "%s", a[k] (k < n ? FS : ORS)
    }
}' file

...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...

CodePudding user response:

This could be easily done in 2 pass Input_file in awk where we need not to create 2 dimensional arrays in it. With your shown samples written in GNU awk.

awk '
BEGIN{FS=OFS=","}
FNR==NR{
  arr1[$3]  
  next
}
{
  $3=(arr1[$3]>1?$3 $2:$3)
}
1
' Input_file  Input_file

Output will be as follows:

...,9,phone9,...
...,43,book,...
...,27,apple,...
...,85,hook,...
...,43,phone43,...
  • Related