I would like to know how to use jq to extract patterns from a .json file
echo '{"parts": [{"name":"core","items":"garbage with ITEM1 ITEM2 and more"},{"name":"misc","items":"ITEM3 ITEM4 ITEM5 bla bla"} ]}' | jq '.parts | .[] | .items |=split(" ")'
{
"name": "core",
"items": [
"garbage",
"with",
"ITEM1",
"ITEM2",
"and",
"more"
]
}
{
"name": "misc",
"items": [
"ITEM3",
"ITEM4",
"ITEM5",
"bla",
"bla"
]
}
I think in splitting the items, but I don't know how to extract each ITEMx.
I want to obtain this output:
{ "core","ITEM1" }
{ "core","ITEM2" }
{ "misc","ITEM3" }
{ "misc","ITEM4" }
{ "misc","ITEM5" }
CodePudding user response:
Your desired output is not valid JSON.
Do you want the words form an array under the value of the .name
field?
jq '.parts[] | {(.name): (.items | split(" "))}'
{
"core": [
"garbage",
"with",
"ITEM1",
"ITEM2",
"and",
"more"
]
}
{
"misc": [
"ITEM3",
"ITEM4",
"ITEM5",
"bla",
"bla"
]
}
Or do you want each word to form a separate object?
jq '.parts[] | (.items | split(" "))[] as $word | {(.name): $word}'
{"core":"garbage"}
{"core":"with"}
{"core":"ITEM1"}
{"core":"ITEM2"}
{"core":"and"}
{"core":"more"}
{"misc":"ITEM3"}
{"misc":"ITEM4"}
{"misc":"ITEM5"}
{"misc":"bla"}
{"misc":"bla"}
To only capture words that match the regex ITEM\d
, you could employ the scan
function instead of splitting:
jq '.parts[] | {(.name): .items | scan("ITEM\\d ")}'
{"core":"ITEM1"}
{"core":"ITEM2"}
{"misc":"ITEM3"}
{"misc":"ITEM4"}
{"misc":"ITEM5"}
CodePudding user response:
Building on your attempt, we could try:
.parts
| .[]
| .items |= (split(" ") | map(select(test("ITEM"))))
| {(.name): .items[]}
This produces a stream of objects such as {"core":"ITEM1"}
. If you really want the non-JSON output shown in the Q, it's easy enough to add the additional step.