Consider the below input
{
"name": "examplename1",
"Date1": "value1",
"Date2": "value2",
"Date3": "value3"
}
{
"name": "examplename1",
"Date1": "value4",
"Date2": "value5",
"Date3": "value6"
}
{
"name": "examplename2",
"Date1": "value7",
"Date2": "value8",
"Date3": "value9"
}
{
"name": "examplename2",
"Date1": "value10",
"Date2":"value11",
"Date3": "value12"
}
Require output as below
{
"names": "examplename1",
"availabledates1":[
"value1",
"value4"
],
"availabledates2":[
"value2",
"value5"
],
"availabledates3":[
"value3",
"value6"
]
}
{
"names": "examplename2",
"availabledates1":[
"value7",
"value10"
],
"availabledates2":[
"value8",
"valu11"
],
"availabledates3":[
"value9",
"value12"
]
}
Using JQ
[inputs] | group_by(.name)[] | [{names: .[].name, availabledates1: [.[].Date1], availabledates2: [.[].Date2], availabledates3: [.[].Date3]}] | unique_by(.names) | .[]
Getting output
{
"names": "examplename1",
"availabledates1": [
"value4"
],
"availabledates2": [
"value5"
],
"availabledates3": [
"value6"
]
}
{
"names": "examplename2",
"availabledates1": [
"value7",
"value10"
],
"availabledates2": [
"value8",
"value11"
],
"availabledates3": [
"value9",
"value12"
]
}
Issue 1: This JQ ignores the first row in inputs.
Issue 2: If the input data set is very large this jq takes too much memory and eventually fails to execute as its doing multiple iterations which needs parallel threads.
Can refer this: https://jqplay.org/s/OOHAuv72GAL
Need here more efficient jq which does not fail on large data set and also considers first row in inputs.
CodePudding user response:
Using group_by
:
jq -n '
[inputs] | group_by(.name)[] | {
names: first.name,
avaliableDates1: map(.Date1),
avaliableDates2: map(.Date2),
avaliableDates3: map(.Date3)
}
'
Using reduce
:
jq -n '
(reduce inputs as $i ({}; .[$i.name] |= (
.names = $i.name
| .avaliableDates1 = [$i.Date1]
| .avaliableDates2 = [$i.Date2]
| .avaliableDates3 = [$i.Date3]
)))[]
'
Output:
{
"names": "examplename1",
"avaliableDates1": [
"value1",
"value4"
],
"avaliableDates2": [
"value2",
"value5"
],
"avaliableDates3": [
"value3",
"value6"
]
}
{
"names": "examplename2",
"avaliableDates1": [
"value7",
"value10"
],
"avaliableDates2": [
"value8",
"value11"
],
"avaliableDates3": [
"value9",
"value12"
]
}