This question is looking for a reusable function that can be imported into other jq programs. The question group and key by property is addressing a specific problem. This question is aimed to be as general as possible, since this is a recurring problem.
Given the following (sample) input, how can I create a new object which groups together each person by their country? I know about group_by
, but it returns an array of arrays.
[
{
"name": "anna",
"country": "germany"
},
{
"name": "lisa",
"country": "germany"
},
{
"name": "john",
"country": "usa"
}
]
Running group_by(.country)
produces:
[
[
{
"name": "anna",
"country": "germany"
},
{
"name": "lisa",
"country": "germany"
}
],
[
{
"name": "john",
"country": "usa"
}
]
]
but this structure makes subsequent processing difficult. Instead, I'd prefer to transform the document into the following structure:
{
"germany": [
{
"name": "anna",
"country": "germany"
},
{
"name": "lisa",
"country": "germany"
}
],
"usa": [
{
"name": "john",
"country": "usa"
}
]
}
This would make other tasks such as counting persons per country a lot easier.
How can I do it? If possible, the answer should not rely on the exact format of the sample format, but be applicable in the general case for arbitrary inputs.
CodePudding user response:
Here's a variant using reduce
instead of group_by
:
reduce .[] as $m ({}; .[$m.country] = [$m])
Or as a defined function:
def grp(f): reduce .[] as $m ({}; .[$m|f] = [$m]);
grp(.country)
{
"germany": [
{
"name": "anna",
"country": "germany"
},
{
"name": "lisa",
"country": "germany"
}
],
"usa": [
{
"name": "john",
"country": "usa"
}
]
}
CodePudding user response:
Shorter alternative using group_by
, map()
and add
:
group_by(.country) | map({ (.[0].country): . }) | add
Produces:
{
"germany": [
{
"name": "anna",
"country": "germany"
},
{
"name": "lisa",
"country": "germany"
}
],
"usa": [
{
"name": "john",
"country": "usa"
}
]
}
CodePudding user response:
It is possible to define a reusable function which groups an array by a criteria and uses this criteria as key. Obviously, this will only work for string keys (but one can always add |tostring
).
def group(f):
group_by(f) | map({key:first|f, value:.}) | from_entries;
Transforming to the expected output is then simply:
group(.country)
Additional tasks such as counting persons per country then become trivial:
group(.country) | map_values(length)
produces:
{
"germany": 2,
"usa": 1
}
Chaining other transformations is straight forward too with this helper function. Need a list of names per country?
group(.country) | map_values(map(.name))
voilà
{
"germany": [
"anna",
"lisa"
],
"usa": [
"john"
]
}
To allow arbitrary objects to be used to group objects, the function needs a second parameter which will then convert the group into a string key. The original function can now be redefined to delegate to the more general function:
def group(group;key):
group_by(group) | map({key:first|group|key, value:.}) | from_entries;
def group(group): group(group;.);
Special attention needs to be paid that the string representation of the group must have a 1:1 mapping with the group, otherwise some items will be lost in the from_entries
step.