I have two JSON files that I want to merge together, keeping all records from the first file and merging in the fields from the second file based on a matching key field value.
I'm using the jq
command-line tool on Linux to do this. The latest version of jq
that I can use is: jq-1.5-1-a5b5cbe
Here are my input files, and the JSON I'm trying to generate:
Input File #1:
input1.json
[
{
"id": "item1",
"join_id": "123",
"field1": "val1",
"field2": "val2"
},
{
"id": "item2",
"join_id": "123", // Same join_id as above
"field1": "val1",
"field2": "val2"
},
{
"id": "item3",
"join_id": "456",
"field1": "val3",
"field2": "val4"
},
{
"id": "item4",
"join_id": "789", // Not found in lookup JSON
"field1": "val5",
"field2": "val6"
}
]
Input File #2:
input2.json
[
{
"joinId": "123",
"joinField1": "AAAAAA",
"joinField2": "BBBBBB"
},
{
"joinId": "456",
"joinField1": "CCCCCC",
"joinField2": "DDDDDD"
}
]
Desired Output:
[
{
"id": "item1",
"join_id": "123",
"field1": "val1",
"field2": "val2",
"joinId": "123",
"joinField1": "AAAAAA",
"joinField2": "BBBBBB"
},
{
"id": "item2",
"join_id": "123",
"field1": "val1",
"field2": "val2",
"joinId": "123",
"joinField1": "AAAAAA",
"joinField2": "BBBBBB"
},
{
"id": "item3",
"join_id": "456",
"field1": "val3",
"field2": "val4",
"joinId": "456",
"joinField1": "CCCCCC",
"joinField2": "DDDDDD"
},
{
"id": "item4",
"join_id": "789",
"field1": "val5",
"field2": "val6"
}
]
It's also OK if the joinId
, joinField1
, joinField2
fields get added to the "item4" record, as long as they have null
or empty string values.
Any help with this would be greatly appreciated.
CodePudding user response:
With jq 1.6 you would just employ INDEX
and JOIN
:
jq '[JOIN(INDEX(input[]; .joinId); .[]; .join_id; add)]' input1.json input2.json
For a jq 1.5 solution you could just grab the definitions of those two builtins from the Github Repository and re-implement them within your code:
jq '
def INDEX(stream; idx_expr):
reduce stream as $row ({}; .[$row|idx_expr|tostring] = $row);
def JOIN($idx; stream; idx_expr; join_expr):
stream | [., $idx[idx_expr]] | join_expr;
[JOIN(INDEX(input[]; .joinId); .[]; .join_id; add)]
' input1.json input2.json
But then, you could also multiply out those two definitions to directly fit your use case:
jq '
(reduce input[] as $row ({}; .[$row.joinId] = $row)) as $idx
| map(. $idx[.join_id])
' input1.json input2.json
Output:
[
{
"id": "item1",
"join_id": "123",
"field1": "val1",
"field2": "val2",
"joinId": "123",
"joinField1": "AAAAAA",
"joinField2": "BBBBBB"
},
{
"id": "item2",
"join_id": "123",
"field1": "val1",
"field2": "val2",
"joinId": "123",
"joinField1": "AAAAAA",
"joinField2": "BBBBBB"
},
{
"id": "item3",
"join_id": "456",
"field1": "val3",
"field2": "val4",
"joinId": "456",
"joinField1": "CCCCCC",
"joinField2": "DDDDDD"
},
{
"id": "item4",
"join_id": "789",
"field1": "val5",
"field2": "val6"
}
]