Home > Enterprise >  How to group by certain columns by sum with different grouping?
How to group by certain columns by sum with different grouping?

Time:12-01

I have users table with termination_date column that is either NULL if the users are active or it has a datetime if the users are not active anymore.

There is also a termination_reason column that describes why the user is not active anymore.

So for active users, it's NULL and if they are not active, then it has a value:

id country city termination_date termination_reason
1 Sweden Stockholm 2022-10-04 self
2 Sweden Stockholm 2020-03-20 admin
2 Sweden Stockholm NULL NULL
3 Switzerland Bern NULL NULL
4 Switzerland Bern 2021-12-08 admin
5 Switzerland Bern NULL NULL

I want to display information about active and non-active users grouped by country, city and termination reason (and show active users grouped by country and city only because they don't have termination_reason). But list the non-active users only from the past 12 months (and active of all time), so the above table would result in:

country city active not_active termination_reason
Sweden Stockholm 1 1 self
Sweden Stockholm 1 0 admin
Switzerland Bern 2 0 self
Switzerland Bern 2 1 admin

(Because the user terminated from Sweden and Stockholm is over 12 months ago)

I have tried the following query as suggested in the answer below, but it doesn't work, it doesn't count correctly (also there might be another approach without using SUM):

SELECT 
  SUM(CASE WHEN termination_date IS NULL THEN 1 ELSE 0 END) AS active,
  SUM(CASE WHEN termination_date IS NOT NULL AND termination_date BETWEEN ? AND ? THEN 1 ELSE 0 END) AS not_active,
  country, city, termination_reason
FROM user_data_table
GROUP BY country, city, termination_reason
HAVING termination_reason IS NOT NULL

I believe it also has something to do with the fact I'm summing a group with the termination_reason (because I want to group by it, however it affects the sum as well?)

This is also the reason for the title of this question - because I think I need to SUM the active users without the termination_reason group, and only then group by it

CodePudding user response:

The problem comes from the fact that you don't want to group the number of active users by the termination_reason and want this number to be the same for all rows with the same country and city. Therefore, you must calculate it independently from the grouping that includes the termination_reason column. You can do this with a sub-select in the SELECT list:

SELECT 
  ( SELECT COUNT(*)
    FROM user_data_table
    WHERE country = U.country AND city = U.city AND end_date IS NULL
  ) AS active,
  SUM(CASE WHEN end_date IS NOT NULL AND end_date BETWEEN ? AND ? THEN 1 ELSE 0 END) AS not_active,
  country, city, termination_reason
FROM user_data_table U
WHERE termination_reason IS NOT NULL
GROUP BY country, city, termination_reason

I have added the table alias U for user_data_table in the outer SELECT, so that I can refer to it in the sub-select.

CodePudding user response:

Using your data (and making it a portable table variable)

DECLARE @employees TABLE (ID INT, Country NVARCHAR(20), City NVARCHAR(20), TerminationDate DATE, TerminationReason NVARCHAR(20))
INSERT INTO @employees (ID, Country, City, TerminationDate, TerminationReason) VALUES

(1,'Sweden','Stockholm','2022-10-04','self'),
(2,'Sweden','Stockholm','2020-03-20','admin'),
(2,'Sweden','Stockholm',NULL,NULL),
(3,'Switzerland','Bern',NULL,NULL),
(4,'Switzerland','Bern','2021-12-08','admin'),
(5,'Switzerland','Bern',NULL,NULL)

I think this might be what you're looking for:

SELECT DISTINCT Country, City, a.TerminationReason, 
       SUM(Active) OVER (PARTITION BY Country, City) AS Active, 
       SUM(Not_Active) OVER (PARTITION BY Country, City, TerminationReason) AS Not_Active
  FROM (
SELECT Country, City, TerminationReason, 
       CASE WHEN TerminationDate IS NULL THEN 1 ELSE 0 END AS Active, 
       CASE WHEN TerminationDate IS NOT NULL THEN 1 ELSE 0 END AS Not_Active
  FROM @employees
 WHERE TerminationDate IS NULL
    OR TerminationDate > DATEADD(YEAR,-1,CURRENT_TIMESTAMP)
      ) a
Country     City        TerminationReason   Active  Not_Active
--------------------------------------------------------------
Sweden      Stockholm   NULL                1       0
Sweden      Stockholm   self                1       1
Switzerland Bern        NULL                2       0
Switzerland Bern        admin               2       1

All we're doing here is excluding the folks terminated over a year ago, and then using windowed aggregates to sum the values across the result set.

  • Related