Home > Back-end >  Creating a unique array in awk: can this snippet be elaborated?
Creating a unique array in awk: can this snippet be elaborated?

Time:03-09

Thanks to @EdMorton, I can unique an array in awk this way:

BEGIN {
    # create an array 
    # here, I create an array from a string, but other approaches are possible, too
    split("a b c d e a b", array)

    # unique it
    for (i=1; i in array; i  ) {
        if ( !seen[array[i]]   ) {
            unique[  j] = array[i]
        }
    }

    # print out the result
    for (i=1; i in unique; i  ) {
        print unique[i]
    }
    # results in:
    # a
    # b
    # c
    # d
    # e
}

What I don't understand, though, is this ( !seen[array[i]] ) condition with an increment:

  1. I do understand that we collect unique indices in the seen array;
  2. So, we check if our temp array seen already has an index array[i] (and add it to unique, if it hasn't);
  3. But the increment after the index is the thing I still can't get :) (despite the detailed explanation provided by Ed).

So, my question is the following: can we somehow re-write this conditional in a more elaborate way? May be this would really help to finalise my take on it :)

CodePudding user response:

Hope this is clearer but idk - best I can say is it's more elaborate as requested!

$ cat tst.awk
BEGIN {
    # create an array
    # here, I create an array from a string, but other approaches are possible, too
    split("a b c d e a b", array)

    # unique it
    for (i=1; i in array; i  ) {
        val = array[i]
        count[val] = count[val]   1

        if ( count[val] == 1 ) {
            is_first_time_val_seen = 1
        }
        else {
            is_first_time_val_seen = 0
        }

        if ( is_first_time_val_seen ) {
            unique[  j] = val
        }
    }

    # print out the result
    for (i=1; i in unique; i  ) {
        print unique[i]
    }
}

$ awk -f tst.awk
a
b
c
d
e

CodePudding user response:

Another approach is to put array's values into a new associative array as keys. That will enforce uniqueness:

BEGIN {
  # it's helpful to use the return value from `split`
  n = split("a b c d e a b", array)

  # use the element value as a key.
  # It doesn't really matter what the right-hand side of the assignment is.
  for (i = 1; i <= n; i  ) uniq[array[i]] = i

  # now, it's easy to iterate over the unique keys
  for (elem in uniq) print elem
}

outputs in no guaranteed order:

a
b
c
d
e

if you're using GNU awk, use PROCINFO["sorted_in"] to control sorting of the array traversal

  • Related