Home > Software design >  SUM and GROUP BY uniqued values from multiple columns
SUM and GROUP BY uniqued values from multiple columns

Time:03-01

assume we have a table inspections:

Table:  inspections
Date        Location_A  Location_B  Location_C  Duration
2020-01-01  User01      User02      User03      10
2020-01-02  User01      User03      User04      20
2020-01-03  User02      User03      User02      10
2020-01-04  User03      User01      User02      30
2020-01-05  User05      User01      User04      40

The column Location_A, Location_B and Location_C are using the same codelist/dimension table users:

Table: users
ID      Name
User01  Jack
User02  Anna
User03  Henry
User04  Paul
User05  Rose

I want to SUM specific column duration for each User and to accomplish that I create 3 GROUP BY subqueries for each Location_ and append it using UNION ALL:

SELECT t1.user, SUM(t1.total_duration) tot_duration
FROM 
(
    (SELECT Location_A user, SUM(Duration) total_duration FROM inspections GROUP BY Location_A)
    UNION ALL
    (SELECT Location_B user, SUM(Duration) total_duration FROM inspections GROUP BY Location_B)
    UNION ALL
    (SELECT Location_C user, SUM(Duration) total_duration FROM inspections GROUP BY Location_C)
) t1 

But, there will be problem as the number of column grows, the more sub queries will be incorporated. Any solution for that?

CodePudding user response:

This can potentially be resolved by slightly changing the database structure to something that supports these type of queries.

Instead of having location a, b, c as columns and the userid as value, consider having a column Location and A column User.

That way you can easily group by location, userid and do a sum without having to adjust the query when more locations are added.

So your table would look like:

Table:  inspections
    Date        Location    User    Duration
    2020-01-01      A       User01      10
    2020-01-02      A       User01      20
    2020-01-03      B       User02      10
    2020-01-04      C       User03      30
    2020-01-05      B       User05      40

You would have more rows but it should still be quite easily searchable especially if you use foreign keys on the userId.

This would allow for a query like:

SELECT SUM(Duration) FROM inspections GROUP BY Location, User

Not only does this mean you never need to change the query, if you add more locations, it also means you don't need to change the database structure whenever you add a location.

CodePudding user response:

You marked your database as sql-server - that means you can use the built-in function UNPIVOT. It does not solve your problem completely, but it might be able to ease the script writing part.

SELECT UserID, SUM(Duration) as Duration
FROM Inspections i
UNPIVOT
(UserID FOR Location IN (Location_A, Location_B, Location_C)) as unpvt 
GROUP BY UserID

In this case you won't have to write subqueries for each column but just specify column names as they appear. The returned result of the query you can see in a fiddle

  • Related