I'm trying to extract the value of an JSON object using jq --stream
, because the real data can the size of multiple GigaBytes.
This is the JSON I'm using for my tests, where I want to extract the value of item
:
{
"other": "content here",
"item": {
"A": {
"B": "C"
}
},
"test": "test"
}
The jq
options I'm using:
jq --stream --null-input 'fromstream(inputs | select(.[0][0] == "item"))[]' example.json
However, I don't get any output with this command.
A strange thing I found is that when removing the object after the item
the above command seems to work:
{
"other": "content here",
"item": {
"A": {
"B": "C"
}
}
}
The result looks as expected:
❯ jq --stream --null-input 'fromstream(inputs | select(.[0][0] == "item"))[]' example.json
{
"A": {
"B": "C"
}
}
But as I cannot control the input JSON this is not the solution.
I'm using jq version 1.6 on MacOS.
CodePudding user response:
You didn't truncate the stream, therefore after filtering it to only include the parts below .item
, fromstream
is missing the final back-tracking item [["item"]]
. Either add it manually at the end (not recommended, this would also include the top-level object in the result), or, much simpler, use 1 | truncate_stream
to strip the first level altogether:
jq --stream --null-input '
fromstream(1 | truncate_stream(inputs | select(.[0][0] == "item")))
' example.json
{
"A": {
"B": "C"
}
}
Alternatively, you can use reduce
and setpath
to build up the result object yourself:
jq --stream --null-input '
reduce inputs as $in (null;
if $in | .[0][0] == "item" and has(1) then setpath($in[0];$in[1]) else . end
)
' example.json
{
"item": {
"A": {
"B": "C"
}
}
}
To remove the top level object, either filter for .item
at the end, or, similarly to truncate_stream
, remove the path's first item using [1:]
to strip the first level:
jq --stream --null-input '
reduce inputs as $in (null;
if $in | .[0][0] == "item" and has(1) then setpath($in[0][1:];$in[1]) else . end
)
' example.json
{
"A": {
"B": "C"
}
}