I'm using Presto but any flavor of SQL will do.
I have a table in that format.
Group_id | event_id | month | party | time_interval |
---|---|---|---|---|
1 | 1 | Jan | Player A | 1 hour |
1 | 1 | Jan | Player A | 2 hours |
1 | 1 | Jan | Player B | 1 hours |
1 | 1 | Jan | Player B | 1 hour |
1 | 2 | Jan | Player A | 3 hour |
I need to get the average per group_id, per month, per party
Here's how my average should be calculated
total number of hours per group, per month, per party/total number of events per org, per month
Here's the output I should be expecting for clarity's sake:
Group_id | month | party | avg_time_interval |
---|---|---|---|
1 | Jan | Player A | 3 hours |
1 | Jan | Player B | 1 hour |
Now here's the tricky part. For the first row everything makes perfect sense. We have 6 hours across both events, which we divide by 2 distinct events and get an average of 3.
However for the 2nd row, we get 1 hour instead of 2 because since the user did not get a time included we should be assuming that the interval there was 0. This means that there are still 2 unique events across that org_id, month. So the 2 hours totaled should be divided by 2 and not by 1.
This missing data essentially has made this way more complicated than it should be. Otherwise I believe running the following would've solved it
SELECT Group_id , month, party, total/num_cases FROM(
SELECT Group_id , month, party, SUM(time_interval) AS total, COUNT(DISTINCT(event_id)) AS num_cases
FROM table
GROUP BY Group_id , month, party
)
CodePudding user response:
select distinct Group_id
,month
,party
,total_hours_per_party/max(dns_rnk) over() as avg_time_interval
from (
select Group_id
,month
,party
,sum(time_interval) over(partition by party) as total_hours_per_party
,dense_rank() over(order by event_id) as dns_rnk
from t
) t
Group_id | month | party | avg_time_interval |
---|---|---|---|
1 | Jan | Player A | 3 |
1 | Jan | Player B | 1 |
CodePudding user response:
You may find the count of distinct event_id values grouped by group_id, month; then join this with your table as the following:
SELECT T.Group_id, T.month, T.party
,SUM(T.time_interval)*1.0/ MAX(D.eid) AS avg_time_interval
FROM tbl T
JOIN
(
SELECT Group_id, month,
COUNT(DISTINCT event_id) AS eid
FROM tbl GROUP BY Group_id, month
) D
ON T.Group_id=D.Group_id AND
T.month=D.month
GROUP BY T.Group_id,T.month,T.party
ORDER BY T.Group_id,T.month,T.party