Home > database >  With JQ update JSON array with values from another one, namely JOIN
With JQ update JSON array with values from another one, namely JOIN

Time:03-07

Given two files 1.json and 2.json. They are both arrays of objects. Need to update field ping_latency to 1.json from 2.json

1.json

[
    {
      "domain": "ca944.nordvpn.com",
      "name": "Canada #944",
      "ip_address": "172.83.40.219"
    },
    {
      "domain": "pl128.nordvpn.com",
      "name": "Poland #128",
      "ip_address": "194.99.105.100"
    },
    {
      "domain": "dk151.nordvpn.com",
      "name": "Denmark #151",
      "ip_address": "82.102.20.236"
    },
    {
      "domain": "be148.nordvpn.com",
      "name": "Belgium #148",
      "ip_address": "82.102.19.137",
      "ping_latency": 334
    }
]

2.json

[
    {
      "domain": "ca944.nordvpn.com",
      "name": "Canada #944",
      "ip_address": "172.83.40.219",
      "ping_latency": 123
    },
    {
      "domain": "pl27.nordvpn.com",
      "name": "Poland #27",
      "ip_address": "194.99.105.27",
      "ping_latency": "REMOVED"
    },
    {
      "domain": "dk151.nordvpn.com",
      "name": "Denmark #151",
      "ip_address": "82.102.20.236",
      "ping_latency": 13
    },
    {
      "domain": "be148.nordvpn.com",
      "name": "Belgium #148",
      "ip_address": "82.102.19.137",
      "ping_latency": 67
    }
]

Object with mark "REMOVED" should not appear in result. Because it is not in 1.json.

PS I do not work for NordVPN - this is just an example.


I tried to merge arrays with operator or *. But it always adds "REMOVED" domain.

jq -s 'map(INDEX(.domain)) | add | [.[]]' {1,2}.json

and

jq -s '(.[0]|INDEX(.domain)) as $x | (.[1]|INDEX(.domain)) as $y | $x *$y' {1,2}.json

Both adds "REMOVED" node from 2.json.

[
  {
    "domain": "ca944.nordvpn.com",
    "name": "Canada #944",
    "ip_address": "172.83.40.219",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "name": "Poland #128",
    "ip_address": "194.99.105.100"
  },
  {
    "domain": "dk151.nordvpn.com",
    "name": "Denmark #151",
    "ip_address": "82.102.20.236",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "name": "Belgium #148",
    "ip_address": "82.102.19.137",
    "ping_latency": 67
  },
  {
    "domain": "pl27.nordvpn.com",
    "name": "Poland #27",
    "ip_address": "194.99.105.27",
    "ping_latency": "REMOVED"
  }
]

How to manage it?


Update. After some mental fight I found a way and managed to do that in JQ

jq 'INDEX(.domain) as $u | 
     reduce ($full[][] | {domain,ip_address,name}) as $i (
     []; .   [ $i | .ping_latency=( $u[$i.domain].ping_latency//98767 )]
    )' --slurpfile full 1.json <2.json

But comparing to operator * my approach is about 100 times slower and takes up to 2 second on Intel Core i7-11xxx with array length of 5474 objects

[
  {
    "domain": "ca944.nordvpn.com",
    "ip_address": "172.83.40.219",
    "name": "Canada #944",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "ip_address": "194.99.105.100",
    "name": "Poland #128",
    "ping_latency": 98767
  },
  {
    "domain": "dk151.nordvpn.com",
    "ip_address": "82.102.20.236",
    "name": "Denmark #151",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "ip_address": "82.102.19.137",
    "name": "Belgium #148",
    "ping_latency": 67
  }
]

May be you know a quick better way?

CodePudding user response:

Assuming .domain is unique to the updating objects in 2.json (change if this holds true for another key instead; even spannig over multiple keys is possible using an array, e.g. [.name, .ip_address]), you could use JOIN based on the unique INDEX to match corresponding pairs of objects.

Furthermore, to combine a match, you could simply add up the pair's members, as nothing from 2.json can effectively overwrite anything (else) in 1.json, assuming your sample is representative to this regard. If this ish't the case, use a more fine-grained combination method instead, e.g. first (last | {ping_latency} | select(.[]) // {}) or similar.

Lastly, backed by your description

Object with mark "REMOVED" should not appear in result. Because it is not in 1.json.

it is further assumed that there is generally no object in 2.json which should newly be added to 1.json. As JOIN behaves exactly that way, checking for "REMOVED" is believed to not be necessary.

Having the first file as input while reading the second file into a variable using --argfile:

jq --argfile a 2.json '[JOIN(INDEX($a[]; .domain); .[]; .domain; add)]' 1.json

Or, equivalently, reading in both files into one array using --slurp:

jq -s '[JOIN(INDEX(last[]; .domain); first[]; .domain; add)]' 1.json 2.json
[
  {
    "domain": "ca944.nordvpn.com",
    "name": "Canada #944",
    "ip_address": "172.83.40.219",
    "ping_latency": 123
  },
  {
    "domain": "pl128.nordvpn.com",
    "name": "Poland #128",
    "ip_address": "194.99.105.100"
  },
  {
    "domain": "dk151.nordvpn.com",
    "name": "Denmark #151",
    "ip_address": "82.102.20.236",
    "ping_latency": 13
  },
  {
    "domain": "be148.nordvpn.com",
    "name": "Belgium #148",
    "ip_address": "82.102.19.137",
    "ping_latency": 67
  }
]

Demo

  • Related