Home > Enterprise >  Merge two JSON files using jq in bash
Merge two JSON files using jq in bash

Time:12-31

I'm hoping someone could help, I'm trying to merge two json files. Here is my bash script:

script_directory="/home/joey/scripts/scripts/delete"
region_file="US.en.json"
cnmts_file="cnmts.json"

wget https://github.com/blawar/titledb/raw/master/$region_file -O $script_directory/$region_file
wget https://github.com/blawar/titledb/raw/master/$cnmts_file -O $script_directory/$cnmts_file

#This is here just to simplify the json files
cat $script_directory/$region_file | jq '.[] | {id: .id}' > $script_directory/region_file_id.txt
cat $script_directory/$cnmts_file | jq '.[] | .[] | {titleId: .titleId, otherApplicationId: .otherApplicationId}' > $script_directory/cnmts_titleId_otherApplicationId.txt

Essentially, I'm given two files:

region_file_id.txt:

{
  "id": "01007EF00011E000"
}
{
  "id": "0100225000FEE000"
}
{
  "id": "0100BCE000598000"
}
{
  "id": "0100B42001DB4000"
}
{
  "id": "01008A9001DC2000"
}

and cnmts_titleId_otherApplicationId.txt:

{
  "titleId": "0100000000010000",
  "otherApplicationId": "0100000000010800"
}
{
  "titleId": "0100000000010800",
  "otherApplicationId": "0100000000010000"
}
{
  "titleId": "010000000e5ee000",
  "otherApplicationId": "010000000e5ee800"
}
{
  "titleId": "010000000eef0000",
  "otherApplicationId": "010000000eef0800"
}
{
  "titleId": "010000000eef0800",
  "otherApplicationId": "010000000eef0000"
}
{
  "titleId": "0100000011d90000",
  "otherApplicationId": "0100000011d90800"
}
{
  "titleId": "0100000011d90800",
  "otherApplicationId": "0100000011d90000"
}
{
  "titleId": "0100000011d90800",
  "otherApplicationId": "0100000011d90000"
}

Please note, this is only a snippet of the files, feel free to run the bash script to get a more accurate file.

All the "id" in the 'region_file_id' equal to a "titleId" somewhere in 'cnmts_titleId_otherApplicationId' (the reverse is not true though as it included id from different regions). I'm trying to grab the "otherApplicationId" values for each "id" in 'region_file_id' by cross referencing them and creating a json like: (repeated for every 'id' in region_file_id)

{
 "id": "111000"
 "titleId": "111000" (this one is optional as it is a duplicate from 'id')
 "otherApplicationId": 111800"
}

I've tried searching and tried different snippets:

jq -s '.[0] * .[1]' $script_directory/region_file_id.txt cnmts_titleId_otherApplicationId.txt (only returned 1 object for some reason)
jq -s '{ .[0] as $u | .[1] | select(.id == $u.titleId) |= $u }' $script_directory/region_file_id.txt cnmts_titleId_otherApplicationId.txt

Update:

As peak pointed out:

jq -n --slurpfile ids region_file_id.txt  '
  INDEX(inputs; .titleId | ascii_upcase) as $dict
  | $ids[].id as $id
  | {$id}   $dict[$id]
' cnmts_titleId_otherApplicationId.txt > merged.txt

This seems to work until I hit "null" values where my file doesn't include the correct id, which is another problem all together!

CodePudding user response:

All the "id" in the 'region_file_id' equal to a "titleId" somewhere in 'cnmts_titleId_otherApplicationId'

If that really is the case, then you could proceed as follows:

< cnmts_titleId_otherApplicationId.txt jq -n --slurpfile ids region_file_id.txt  '
  INDEX(inputs; .titleId) as $dict
  | $ids[].id as $id
  | {$id}   $dict[$id]
' 

  • Related