i want to remove the duplicates from each array in this json:
{
"abc": [
"five"
],
"pqr": [
"one",
"one",
"two",
"two",
"three",
"three",
"four",
"four"
],
"xyz": [
"one",
"one",
"two",
"two",
"four"
]
}
output I am expecting after removing the duplicates:
{
"abc": [
"five"
],
"pqr": [
"one",
"two",
"three",
"four"
],
"xyz": [
"one",
"two",
"four"
]
}
i tried map, uniq, group_by with jq
but nothing helped
CodePudding user response:
unique
can remove duplicates, but it automatically sorts the arrays, which may or may not be what you want.
jq '.[] |= unique'
{
"abc": [
"five"
],
"pqr": [
"four",
"one",
"three",
"two"
],
"xyz": [
"four",
"one",
"two"
]
}
You can retrieve the original ordering by recreating the array based on sort
ing the index
positions of all of its unique
items:
jq '.[] |= [.[[index(unique[])] | sort[]]]'
Or circumvent any sorting behaviour by writing your own straightforward de-duplication function:
jq '.[] |= reduce .[] as $i ([]; . if index($i) then [] else [$i] end)'
In my tests, the latter performed best, with both producing
{
"abc": [
"five"
],
"pqr": [
"one",
"two",
"three",
"four"
],
"xyz": [
"one",
"two",
"four"
]
}
CodePudding user response:
Here is a sort-free alternative for obtaining the distinct items in an array (or stream) while retaining the order of first occurrence.
It uses a filter that is a tiny bit more complex than it would otherwise be, for the sake of complete genericity:
# generate a stream of the distinct items in `stream`
# in order of first occurrence, without sorting
def uniques(stream):
foreach stream as $s ({};
($s|type) as $t
| (if $t == "string" then $s else ($s|tostring) end) as $y
| if .[$t][$y] then .emit = false else .emit = true | (.item = $s) | (.[$t][$y] = true) end;
if .emit then .item else empty end );
Now it's just a matter of applying this filter to your JSON. One possibility would be:
map_values([uniques(.[])])