Home > OS >  Postgres division of two subquery sums using WHERE from another subquery
Postgres division of two subquery sums using WHERE from another subquery

Time:11-15

We store volunteer hours in a database for every user, i'm trying to figure out how to structure a Postgres SQL query to find the percentage of total hours a specific user has spent at a location vs the total number of hours at that location (by any user)

CREATE TABLE entries (
  id INT,
  userId INT,
  volunteerHours DOUBLE PRECISION NOT NULL,
  location INT
);

INSERT INTO entries (id, userId, volunteerHours, location) VALUES (1, 1, 3.0, 2);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (2, 1, 3.0, 1);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (3, 1, 3.0, 1);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (4, 2, 3.0, 1);
INSERT INTO entries (id, userId, volunteerHours, location) VALUES (5, 2, 3.0, 1);

There could be hundreds of these entries in the db, these are shown just for examples

Pseudo code:

SELECT
(SELECT location from entries ORDER BY id DESC FETCH NEXT 1 ROWS ONLY) as lastEntryLocation,
(SELECT SUM(volunteerHours) from entries WHERE userId = 2 AND location = lastEntryLocation) as userHours,
(SELECT SUM(volunteerHours) from entries AND location = lastEntryLocation) as totalHours,
(SELECT userHours / totalHours) as userContributionPercentage 

I'm still learning how to write custom SQL queries, hoping someone could at least point me in the right direction of how I should be structuring this query correctly, to return in this situation 0.50

Thought process from sudo query:

  1. Pull the latest entry in the db to determine the last location ID value inserted
  2. Use last location value ID to then query the SUM of all userId 2's hours
  3. Use last location value ID to query the SUM of all users hours
  4. Divide userId 2's total hours by the total hours at that specific location

I would like to try and do all of this in a SQL query if possible (I could break it out into separate SQL query calls and programmatically do the math, but I would prefer to learn how to do this through one SQL query call if at all possible.

Created a db fiddle as well for this: https://www.db-fiddle.com/f/7dfFZTa2yN3jpSMVLcFXWc/4

If you could also explain why you chose a specific method used to create the query for this, that would greatly help me to understand the reasoning behind it and know where to look further to learn more myself. Thank you!

EDIT: Looking for a way to not have to do the query twice for the latest location (as it feels like the wrong way to do this)

CodePudding user response:

Window functions with sums for each row's partition/group:

select distinct on (e.userId) 
    sum(e.volunteerHours) over (partition by userId,location,id)
      / sum(e.volunteerHours) over (partition by userId,location) 
            as percentage_volunteered_here
from entries e
where userId=1
order by userId, id desc;

Distinct on (a) a,c...order by a,b lets you take the first row for each a, according to your desired order/priority - in your case, the latest location.

An equivalent using subqueries:

select distinct on (userId) 
  (  select sum(volunteerHours) 
     from entries e2 
     where e1.userId=e2.userId 
     and e1.location=e2.location
     and e1.id=e2.id) --hours_during_this_visit_at_location
  /( select sum(volunteerHours) --total_hours_at_this_location
     from entries e2 
     where e1.userId=e2.userId
     and e1.location=e2.location) as percentage_volunteered_here
from entries e1
where userId=1
order by userId, id desc;

You might enjoy how your line of thought can be mapped directly, step for step, into a few WITH common table expressions:

with 
 last_location_per_user as (
    select distinct on (userId) 
            userId, 
            location,
            volunteerHours
    from entries
    where userId=1
    order by userId, id desc
 ),
total_hours_at_last_location_per_user as (
    select  e.userId, 
            sum(e.volunteerHours)
    from    entries e
        inner join last_location_per_user l
            using (userId,location)
    group by userId
)
select  l.volunteerHours/t.sum as percentage_volunteered_here
from    last_location_per_user l,
        total_hours_at_last_location_per_user t
where   t.userId=l.userId;

Demo

CodePudding user response:

This is how you can write a query but this is not the only(best) way

SELECT
(SELECT (SELECT SUM(volunteerHours) from entries WHERE userId = 2 AND location = (SELECT location from entries ORDER BY id DESC FETCH NEXT 1 ROWS ONLY)) / (SELECT SUM(volunteerHours) from entries Where location = (SELECT location from entries ORDER BY id DESC FETCH NEXT 1 ROWS ONLY))) as userContributionPercentage
  • Related