I am trying to finish up a data aggregation pipeline and having issues getting the data into the correct format. I'm not even sure if this is possible to do in one pipeline.
The original data looks like this:
[
{
answers: {
'question1': 'a',
'question2': 'c',
'question3': ['a','b'],
'question4': 1
},
createdAt: 2022-03-04T07:30:40.517Z,
},
{
answers: {
'question1': 'b',
'question2': 'c',
'question3': ['a','c']
'question4': 2
},
createdAt: 2022-03-04T07:30:40.518Z,
}
]
I've got my pipeline so far with this:
{ $project: {
"answers": { $objectToArray: "$answers" },
"date": { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" }}
}},
{ $unwind: "$answers" },
{ $unwind: "$answers.v" },
{
$group: {
_id: { answers : "$answers", date: "$date"},
c: { $sum: 1 }}
},
and the data now looks like this:
{
_id: {
answers: { k: 'q3', v: 'b' },
date: '2022-03-04'
},
count: 1
},
{
_id: {
answers: { k: 'q3', v: 'a' },
date: '2022-03-04'
},
count: 2
},
{
_id: {
answers: { k: 'q4', v: 1 },
date: '2022-03-04'
},
count: 1
},
{
_id: {
answers: { k: 'q1', v: 'b' },
date: '2022-03-04'
},
count: 1
},
{
_id: {
answers: { k: 'q4', v: 2 },
date: '2022-03-04'
},
count: 1
},
{
_id: {
answers: { k: 'q2', v: 'c' },
date: '2022-03-04'
},
count: 2
},
{
_id: {
answers: { k: 'q3', v: 'c' },
date: '2022-03-04'
},
count: 1
},
{
_id: {
answers: { k: 'q1', v: 'a' },
date: '2022-03-04'
},
count: 1
}
I would like to get a result that looks something like this:
{
'dates': [
{
'date': '2022-03-04',
'q1': { 'a': 1, 'b': 1 }
'q2': { 'c': 2 },
'q3': { 'a': 2, 'b': 1, 'c': 1 },
'q4': { '1': 1, '2': 1 }
}
]
'totals': { // this would be the totals across all the dates
'q1': { 'a': 1, 'b': 1 }
'q2': { 'c': 2 },
'q3': { 'a': 2, 'b': 1, 'c': 1 },
'q4': { '1': 1, '2': 1 }
}
}
any help would be greatly appreciated, even if I can't get both the totals and breakdown in 1 query.
here is the mongoplaygroud I've been working on
CodePudding user response:
Not that simple. An important stage you have to use is $facet in order to get totals
and dates
Maybe with $setWindowFields the aggregation pipeline could be a little simpler, but that a quick guess.
db.collection.aggregate([
{
$project: {
_id: 0,
answers: { $objectToArray: "$answers" },
date: { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" } }
}
},
{ $unwind: "$answers" },
{ $unwind: "$answers.v" },
{
$group: {
_id: {
answer: "$answers.v",
question: "$answers.k",
date: "$date"
},
count: { $sum: 1 }
}
},
{
$facet: {
dates: [
{
$group: {
_id: { question: "$_id.question", date: "$_id.date" },
count: {
$push: {
k: { $toString: "$_id.answer" },
v: "$count"
}
}
}
},
{
$group: {
_id: "$_id.date",
count: {
$push: {
k: "$_id.question",
v: { $arrayToObject: "$count" }
}
}
}
},
{
$replaceWith: {
$mergeObjects: [
{ date: "$_id" },
{ $arrayToObject: "$count" }
]
}
}
],
totals: [
{
$group: {
_id: { answer: "$_id.answer", question: "$_id.question" },
v: { $push: "$count" }
}
},
{
$group: {
_id: "$_id.question",
count: {
$push: {
k: { $toString: "$_id.answer" },
v: { $sum: "$v" }
}
}
}
},
{
$project: {
_id: 0,
k: "$_id",
v: { $arrayToObject: "$count" }
}
}
]
}
},
{ $set: { totals: { $arrayToObject: "$totals" } } }
])