Home > Mobile >  Window function needs group by, buy why?
Window function needs group by, buy why?

Time:10-08

I have an ecommerce and want to measure retention after 60 days of the first sale. A retained user has done at least one purchase between the next 60 days.

I have a sales table with the following information:

Sale ID | Customer ID | Date
1       |  405        | 2021-03-05
2       |  408        | 2021-03-06
3       |  231        | 2021-03-07

I'm having issues with the part that calculates the retention.

I try to use this query:

SELECT
         "Customer ID",
         COUNT("Sale ID") OVER(
            PARTITION BY "Customer ID"
            ORDER BY "Date" ASC
            RANGE BETWEEN CURRENT ROW AND INTERVAL '60 days' FOLLOWING
            ) AS "60 days Sales"
            
         FROM "Sales"
      GROUP BY "Customer ID"

After trying this, I get this error: ERROR: Column "Date" must appear in the GROUP BY clause or be used in aggregate

I expect that query to get the number of sales done between those 60 days for any given customer. That way, I can use that later and filter customers as retained or churned.

What am I doing wrong? I'm not sure why "Date" should be grouped: I expect the result to be grouped by Customer ID.

Thanks!

CodePudding user response:

select "Customer ID",
    case when max("60 days Sales") > 1 then 'Retained' else 'Churned' end as Status
from (
    SELECT
        "Customer ID",
        COUNT("Sale ID") OVER(
            PARTITION BY "Customer ID"
            ORDER BY "Date" ASC
            RANGE BETWEEN CURRENT ROW AND INTERVAL '60 days' FOLLOWING
        ) AS "60 days Sales"            
    FROM "Sales"
) t
GROUP BY "Customer ID"

Look at each sale first. Then decide whether that customer has a sale that fits your 60-day condition. If your logic requires looking at the initial sale then you'll have to do something different.

CodePudding user response:

The problem is that window functions are evaluated after GROUP BY, so PostgreSQL doesn't know which of the "Date" in one group you mean.

You should probably use a filtered aggregate rather than a window function. I am not sure what exactly you intend to query, but maybe it is something similar to the following:

SELECT "Customer ID",
       COUNT("Sale ID")
          FILTER (WHERE "Date" BETWEEN current_timestamp
                               AND current_timestamp   INTERVAL '60 days')
          AS "60 days Sales"
FROM "Sales"
GROUP BY "Customer ID";
  • Related