I am trying to extract the values from a top-level JSON object using streaming with jq
. For the sake of illustration, this is what the data look like (the actual data are rather large, hence needing to use streaming):
{
"empty": null,
"name": "John Smith",
"sex": "male",
"age": 51,
"hobbies": [
"running",
"kayaking",
"camping",
"foraging"
]
}
Without streaming it's easy to get what I need:
$ jq ".name" sample.json
"John Smith"
$ jq ".age" sample.json
51
$ jq ".hobbies" sample.json
[
"running",
"kayaking",
"camping",
"foraging"
]
When I use streaming I can get the value for the "hobbies"
key:
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "hobbies")))' <sample.json
["running","kayaking","camping","foraging"]
But using the analogous command for the "name"
or "age"
keys gives an empty result:
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "name")))' <sample.json
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "age")))' <sample.json
I suspect that this is because the value is a scalar. But I'm not sure that this is the reason and, even if I was, I'm not sure how to use that information.
I discovered the debug
operation which seems to yield some light on the situation.
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "hobbies") | debug))' <sample.json
["DEBUG:",[["hobbies",0],"running"]]
["DEBUG:",[["hobbies",1],"kayaking"]]
["DEBUG:",[["hobbies",2],"camping"]]
["DEBUG:",[["hobbies",3],"foraging"]]
["DEBUG:",[["hobbies",3]]]
["running","kayaking","camping","foraging"]
["DEBUG:",[["hobbies"]]]
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "name") | debug))' <sample.json
["DEBUG:",[["name"],"John Smith"]]
$ jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "age") | debug))' <sample.json
["DEBUG:",[["age"],51]]
So it looks like these values are being selected, but they are just not making it through to the output.
Any suggestions would be appreciated! Thank you.
CodePudding user response:
You need to understand how 1 | truncate_stream()
works before subsequently applying other filter expressions. The truncate_stream()
prefixed with a non-zero integer is used to remove paths specified by the integer in the streamed result.
e.g. if your original result produced the following[path, value]
pairs
jq -cn --stream 'inputs' json
[["empty"],null]
[["name"],"John Smith"]
[["sex"],"male"]
[["age"],51]
[["hobbies",0],"running"]
[["hobbies",1],"kayaking"]
[["hobbies",2],"camping"]
[["hobbies",3],"foraging"]
[["hobbies",3]]
[["hobbies"]]
Truncation with 1
would remove the first element of each path provided. Those with the paths removed are completely discarded from the output
jq -cn --stream '1|truncate_stream(inputs)' json
[[0],"running"]
[[1],"kayaking"]
[[2],"camping"]
[[3],"foraging"]
[[3]]
Your original attempt worked because, the select
expression was able to get the desired paths to hobbies
, with the parent root key hobbies
removed, retaining only a list of elements.
But the same doesn't work for age
, as you cannot completely trim down the path away. Remove the ["age"]
entry would leave a result as [[],51]
leaving only the value field.
jq -cn --stream 'inputs|select(.[0][0] == "age")' json
[["age"],51]
If a level is provided to the above expression, i.e. 1|..
the age
path would be completely removed, making the fromstream
not construct your object back.
So for simple scalars, simply extract away the value from the indices as below without needing to use truncate at all
jq -cn --stream 'inputs|select(.[0][0] == "age")[1]'
51