Home > Mobile >  jq - remove duplicates from arrays
jq - remove duplicates from arrays

Time:12-05

i want to remove the duplicates from each array in this json:

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "one",
    "two",
    "two",
    "three",
    "three",
    "four",
    "four"
  ],
  "xyz": [
    "one",
    "one",
    "two",
    "two",
    "four"
  ]
}

output I am expecting after removing the duplicates:

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "two",
    "three",
    "four"
  ],
  "xyz": [
    "one",
    "two",
    "four"
  ]
}

i tried map, uniq, group_by with jq but nothing helped

CodePudding user response:

unique can remove duplicates, but it automatically sorts the arrays, which may or may not be what you want.

jq '.[] |= unique'
{
  "abc": [
    "five"
  ],
  "pqr": [
    "four",
    "one",
    "three",
    "two"
  ],
  "xyz": [
    "four",
    "one",
    "two"
  ]
}

Demo

You can retrieve the original ordering by recreating the array based on sort ing the index positions of all of its unique items:

jq '.[] |= [.[[index(unique[])] | sort[]]]'

Demo

Or circumvent any sorting behaviour by writing your own straightforward de-duplication function:

jq '.[] |= reduce .[] as $i ([]; .   if index($i) then [] else [$i] end)'

Demo

In my tests, the latter performed best, with both producing

{
  "abc": [
    "five"
  ],
  "pqr": [
    "one",
    "two",
    "three",
    "four"
  ],
  "xyz": [
    "one",
    "two",
    "four"
  ]
}

CodePudding user response:

Here is a sort-free alternative for obtaining the distinct items in an array (or stream) while retaining the order of first occurrence.

It uses a filter that is a tiny bit more complex than it would otherwise be, for the sake of complete genericity:

# generate a stream of the distinct items in `stream`
# in order of first occurrence, without sorting
def uniques(stream):
  foreach stream as $s ({};
     ($s|type) as $t
     | (if $t == "string" then $s else ($s|tostring) end) as $y
     | if .[$t][$y] then .emit = false else .emit = true | (.item = $s) | (.[$t][$y] = true) end;
     if .emit then .item else empty end );

Now it's just a matter of applying this filter to your JSON. One possibility would be:

 map_values([uniques(.[])])
  • Related