Home > Back-end >  jq - How to Left Join and merge fields from two input JSON files
jq - How to Left Join and merge fields from two input JSON files

Time:06-14

I have two JSON files that I want to merge together, keeping all records from the first file and merging in the fields from the second file based on a matching key field value.

I'm using the jq command-line tool on Linux to do this. The latest version of jq that I can use is: jq-1.5-1-a5b5cbe

Here are my input files, and the JSON I'm trying to generate:

Input File #1: input1.json

[
  {
    "id": "item1",
    "join_id": "123",
    "field1": "val1",
    "field2": "val2"
  },
  {
    "id": "item2",
    "join_id": "123",   // Same join_id as above
    "field1": "val1",
    "field2": "val2"
  },
  {
    "id": "item3",
    "join_id": "456",
    "field1": "val3",
    "field2": "val4"
  },
  {
    "id": "item4",
    "join_id": "789",   // Not found in lookup JSON
    "field1": "val5",
    "field2": "val6"
  }
]

Input File #2: input2.json

[
  {
    "joinId": "123",
    "joinField1": "AAAAAA",
    "joinField2": "BBBBBB"
  },
  {
    "joinId": "456",
    "joinField1": "CCCCCC",
    "joinField2": "DDDDDD"
  }
]

Desired Output:

[
  {
    "id": "item1",
    "join_id": "123",
    "field1": "val1",
    "field2": "val2",
    "joinId": "123",
    "joinField1": "AAAAAA",
    "joinField2": "BBBBBB"
  },
  {
    "id": "item2",
    "join_id": "123",
    "field1": "val1",
    "field2": "val2",
    "joinId": "123",
    "joinField1": "AAAAAA",
    "joinField2": "BBBBBB"
  },
  {
    "id": "item3",
    "join_id": "456",
    "field1": "val3",
    "field2": "val4",
    "joinId": "456",
    "joinField1": "CCCCCC",
    "joinField2": "DDDDDD"
  },
  {
    "id": "item4",
    "join_id": "789",
    "field1": "val5",
    "field2": "val6"
  }
]

It's also OK if the joinId, joinField1, joinField2 fields get added to the "item4" record, as long as they have null or empty string values.

Any help with this would be greatly appreciated.

CodePudding user response:

With jq 1.6 you would just employ INDEX and JOIN:

jq '[JOIN(INDEX(input[]; .joinId); .[]; .join_id; add)]' input1.json input2.json

Demo

For a jq 1.5 solution you could just grab the definitions of those two builtins from the Github Repository and re-implement them within your code:

jq '
  def INDEX(stream; idx_expr):
    reduce stream as $row ({}; .[$row|idx_expr|tostring] = $row);
  def JOIN($idx; stream; idx_expr; join_expr):
    stream | [., $idx[idx_expr]] | join_expr;
  [JOIN(INDEX(input[]; .joinId); .[]; .join_id; add)]
' input1.json input2.json

Demo

But then, you could also multiply out those two definitions to directly fit your use case:

jq '
  (reduce input[] as $row ({}; .[$row.joinId] = $row)) as $idx
  | map(.   $idx[.join_id])
' input1.json input2.json

Demo

Output:

[
  {
    "id": "item1",
    "join_id": "123",
    "field1": "val1",
    "field2": "val2",
    "joinId": "123",
    "joinField1": "AAAAAA",
    "joinField2": "BBBBBB"
  },
  {
    "id": "item2",
    "join_id": "123",
    "field1": "val1",
    "field2": "val2",
    "joinId": "123",
    "joinField1": "AAAAAA",
    "joinField2": "BBBBBB"
  },
  {
    "id": "item3",
    "join_id": "456",
    "field1": "val3",
    "field2": "val4",
    "joinId": "456",
    "joinField1": "CCCCCC",
    "joinField2": "DDDDDD"
  },
  {
    "id": "item4",
    "join_id": "789",
    "field1": "val5",
    "field2": "val6"
  }
]
  • Related