N1Ql query grouping on distinct values with different keys-CodePudding

I have ~7 million docs in a bucket and I am struggling to write the correct query/index combo to prevent it from running >5 seconds.

Here is a similar scenario to the one I am trying to solve:

I have multiple coffee shops each making coffee with different container/lid combos. These field key’s are also different for different doc types. With each sale being generated I keep track of these combos.

Here are a few example docs:

[{
    "shopId": "x001",
    "date":    "2022-01-01T08:49:00Z",
    "cappuccinoContainerId": "a001",
    "cappuccinoLidId": "b001"
},
{
    "shopId": "x001",
    "date":    "2022-01-02T08:49:00Z",
    "latteContainerId": "a002",
    "latteLidId": "b002"
},
{
    "shopId": "x001",
    "date":    "2022-01-02T08:49:00Z",
    "espressoContainerId": "a003",
    "espressoLidId": "b003"
},
{
    "shopId": "x002",
    "date":    "2022-01-01T08:49:00Z",
    "cappuccinoContainerId": "a001",
    "cappuccinoLidId": "b001"
},
{
    "shopId": "x002",
    "date":    "2022-01-02T08:49:00Z",
    "latteContainerId": "a002",
    "latteLidId": "b002"
},
{
    "shopId": "x002",
    "date":    "2022-01-02T08:49:00Z",
    "espressoContainerId": "a003",
    "espressoLidId": "b003"
}]

What I need to get out of the query is the following:

[{
    "shopId": "x001",
    "day":    "2022-01-01",
    "uniqueContainersLidsCombined": 2
},
{
    "shopId": "x001",
    "day":    "2022-01-01",
    "uniqueContainersLidsCombined": 4
},
{
    "shopId": "x002",
    "day":    "2022-01-01",
    "uniqueContainersLidsCombined": 2
},
{
    "shopId": "x002",
    "day":    "2022-01-01",
    "uniqueContainersLidsCombined": 4
}]

I.e. I want the total number of unique containers and lids combined per site and day.

I have tried using composite, adaptive and FTS indexes but I unable to figure this one out.

Does anybody have a different suggestion? Can someone please help?

CodePudding user response：

I assume

That the cup types and lid types can be used for any drink type.
That you don't want to add any precomputed stuff to your data.

Perhaps an index like this my collection keyspace is in bulk.sales.amer, note I am not sure if this performs better or worse (or even if it is equivalent) WRT the solution posted by vsr:

CREATE INDEX `adv_shopId_concat_nvls` 
ON `bulk`.`sales`.`amer`(
  `shopId` MISSING,
  (
    nvl(`cappuccinoContainerId`, "") || 
    nvl(`cappuccinoLidId`, "") || 
    nvl(`latteContainerId`, "") || 
    nvl(`latteLidId`, "") || 
    nvl(`espressoContainerId`, "") || 
    nvl(`espressoLidId`, "")),substr0(`date`, 0, 10)
  )

And then a using the covered index above do your query like this:

SELECT 
   shopId,
   CONCAT( 
     NVL(cappuccinoContainerId,""), 
     NVL(cappuccinoLidId,""), 
     NVL(latteContainerId,""), 
     NVL(latteLidId,""), 
     NVL(espressoContainerId,""), 
     NVL(espressoLidId,"") 
   ) AS uniqueContainersLidsCombined,
   SUBSTR(date,0,10) AS day,
   COUNT(*) AS cnt
FROM `bulk`.`sales`.`amer`
GROUP BY 
  shopId,
  CONCAT( 
    NVL(cappuccinoContainerId,""), 
    NVL(cappuccinoLidId,""), 
    NVL(latteContainerId,""), 
    NVL(latteLidId,""), 
    NVL(espressoContainerId,""), 
    NVL(espressoLidId,"") 
  ),
  SUBSTR(date,0,10)

If you still don't get the performance you need, you could possibly use the Eventing service to do a map/reduce and an occasional query to make sure things stay perfectly in sync.

CodePudding user response：

CREATE INDEX ix1 ON default(shopId, DATE_FORMAT_STR(date,"1111-11-11"), [cappuccinoContainerId, cappuccinoLidId]);

If Using EE and shopId is immutable add PARTITION BY HASH (shopId) to above index definition (with higher partition numbers).

SELECT d.shopId,
       DATE_FORMAT_STR(d.date,"1111-11-11") AS day
       COUNT(DISTINCT [d.cappuccinoContainerId, d.cappuccinoLidId]) AS uniqueContainersLidsCombined
FROM default AS d
WHERE d.shopId IS NOT NULL
GROUP BY d.shopId, DATE_FORMAT_STR(d.date,"1111-11-11");

Adjust index key order of shopId, day based on the query predicates. https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/

Update:

Based on EXPLAIN you have date predicate and all shopIds so use following index

CREATE INDEX ix2 ON default( DATE_FORMAT_STR(date,"1111-11-11"), shopId, [cappuccinoContainerId, cappuccinoLidId]);