Home > OS >  Function to group_by but return an object with group as key
Function to group_by but return an object with group as key

Time:09-29

This question is looking for a reusable function that can be imported into other jq programs. The question group and key by property is addressing a specific problem. This question is aimed to be as general as possible, since this is a recurring problem.

Given the following (sample) input, how can I create a new object which groups together each person by their country? I know about group_by, but it returns an array of arrays.

[
  {
    "name": "anna",
    "country": "germany"
  },
  {
    "name": "lisa",
    "country": "germany"
  },
  {
    "name": "john",
    "country": "usa"
  }
]

Running group_by(.country) produces:

[
  [
    {
      "name": "anna",
      "country": "germany"
    },
    {
      "name": "lisa",
      "country": "germany"
    }
  ],
  [
    {
      "name": "john",
      "country": "usa"
    }
  ]
]

but this structure makes subsequent processing difficult. Instead, I'd prefer to transform the document into the following structure:

{
  "germany": [
    {
      "name": "anna",
      "country": "germany"
    },
    {
      "name": "lisa",
      "country": "germany"
    }
  ],
  "usa": [
    {
      "name": "john",
      "country": "usa"
    }
  ]
}

This would make other tasks such as counting persons per country a lot easier.

How can I do it? If possible, the answer should not rely on the exact format of the sample format, but be applicable in the general case for arbitrary inputs.

CodePudding user response:

Here's a variant using reduce instead of group_by:

reduce .[] as $m ({}; .[$m.country]  = [$m])

Demo

Or as a defined function:

def grp(f): reduce .[] as $m ({}; .[$m|f]  = [$m]);

grp(.country)

Demo

{
  "germany": [
    {
      "name": "anna",
      "country": "germany"
    },
    {
      "name": "lisa",
      "country": "germany"
    }
  ],
  "usa": [
    {
      "name": "john",
      "country": "usa"
    }
  ]
}

CodePudding user response:

Shorter alternative using group_by, map() and add:

group_by(.country) | map({ (.[0].country): . }) | add

Produces:

{
  "germany": [
    {
      "name": "anna",
      "country": "germany"
    },
    {
      "name": "lisa",
      "country": "germany"
    }
  ],
  "usa": [
    {
      "name": "john",
      "country": "usa"
    }
  ]
}

Jq{Play

CodePudding user response:

It is possible to define a reusable function which groups an array by a criteria and uses this criteria as key. Obviously, this will only work for string keys (but one can always add |tostring).

def group(f):
  group_by(f) | map({key:first|f, value:.}) | from_entries;

Transforming to the expected output is then simply:

group(.country)

Additional tasks such as counting persons per country then become trivial:

group(.country) | map_values(length)

produces:

{
  "germany": 2,
  "usa": 1
}

Chaining other transformations is straight forward too with this helper function. Need a list of names per country?

group(.country) | map_values(map(.name))

voilà

{
  "germany": [
    "anna",
    "lisa"
  ],
  "usa": [
    "john"
  ]
}

To allow arbitrary objects to be used to group objects, the function needs a second parameter which will then convert the group into a string key. The original function can now be redefined to delegate to the more general function:

def group(group;key):
  group_by(group) | map({key:first|group|key, value:.}) | from_entries;
def group(group): group(group;.);

Special attention needs to be paid that the string representation of the group must have a 1:1 mapping with the group, otherwise some items will be lost in the from_entries step.

  • Related