Home > Software engineering >  Extracting values using jq streaming
Extracting values using jq streaming

Time:10-11

I am trying to extract the values from a top-level JSON object using streaming with jq. For the sake of illustration, this is what the data look like (the actual data are rather large, hence needing to use streaming):

{
        "empty": null,
        "name": "John Smith",
        "sex": "male",
        "age": 51,
        "hobbies": [
          "running",
          "kayaking",
          "camping",
          "foraging"
         ]
}

Without streaming it's easy to get what I need:

$ jq ".name" sample.json 
"John Smith"
$ jq ".age" sample.json 
51
$ jq ".hobbies" sample.json 
[
  "running",
  "kayaking",
  "camping",
  "foraging"
]

When I use streaming I can get the value for the "hobbies" key:

$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "hobbies")))' <sample.json
["running","kayaking","camping","foraging"]

But using the analogous command for the "name" or "age" keys gives an empty result:

$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "name")))' <sample.json
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "age")))' <sample.json

I suspect that this is because the value is a scalar. But I'm not sure that this is the reason and, even if I was, I'm not sure how to use that information.

I discovered the debug operation which seems to yield some light on the situation.

$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "hobbies") | debug))' <sample.json
["DEBUG:",[["hobbies",0],"running"]]
["DEBUG:",[["hobbies",1],"kayaking"]]
["DEBUG:",[["hobbies",2],"camping"]]
["DEBUG:",[["hobbies",3],"foraging"]]
["DEBUG:",[["hobbies",3]]]
["running","kayaking","camping","foraging"]
["DEBUG:",[["hobbies"]]]
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "name") | debug))' <sample.json
["DEBUG:",[["name"],"John Smith"]]
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "age") | debug))' <sample.json
["DEBUG:",[["age"],51]]

So it looks like these values are being selected, but they are just not making it through to the output.

Any suggestions would be appreciated! Thank you.

CodePudding user response:

You need to understand how 1 | truncate_stream() works before subsequently applying other filter expressions. The truncate_stream() prefixed with a non-zero integer is used to remove paths specified by the integer in the streamed result.

e.g. if your original result produced the following[path, value] pairs

jq -cn --stream 'inputs' json
[["empty"],null]
[["name"],"John Smith"]
[["sex"],"male"]
[["age"],51]
[["hobbies",0],"running"]
[["hobbies",1],"kayaking"]
[["hobbies",2],"camping"]
[["hobbies",3],"foraging"]
[["hobbies",3]]
[["hobbies"]]

Truncation with 1 would remove the first element of each path provided. Those with the paths removed are completely discarded from the output

jq -cn --stream '1|truncate_stream(inputs)' json
[[0],"running"]
[[1],"kayaking"]
[[2],"camping"]
[[3],"foraging"]
[[3]]

Your original attempt worked because, the select expression was able to get the desired paths to hobbies, with the parent root key hobbies removed, retaining only a list of elements.

But the same doesn't work for age, as you cannot completely trim down the path away. Remove the ["age"] entry would leave a result as [[],51] leaving only the value field.

jq -cn --stream 'inputs|select(.[0][0] == "age")' json
[["age"],51]

If a level is provided to the above expression, i.e. 1|.. the age path would be completely removed, making the fromstream not construct your object back.

So for simple scalars, simply extract away the value from the indices as below without needing to use truncate at all

jq -cn --stream 'inputs|select(.[0][0] == "age")[1]'
51
  • Related