Home > Enterprise >  SQL - Calculate percentage by group, for multiple groups
SQL - Calculate percentage by group, for multiple groups

Time:02-23

I have a table in GBQ in the following format :

UserId  Orders  Month  
 XDT     23      1
 XDT     0       4     
 FKR     3       6
 GHR     23      4
 ...     ...    ...

It shows the number of orders per user and month.

I want to calculate the percentage of users who have orders, I did it as following :

SELECT
  HasOrders,
  ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
    SELECT
        *,
        CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
    FROM `Table` ) 
GROUP BY
  HasOrders
ORDER BY
  Parts

It gives me the following result:

HasOrders   Parts
   0         35
   1         65

I need to calculate the percentage of users who have orders, by month, in a way that every month = 100%

Currently to do this I execute the query once per month, which is not practical :

SELECT
  HasOrders,
  ROUND(COUNT(*) * 100 / CAST( SUM(COUNT(*)) OVER () AS float64), 2) Parts
FROM (
    SELECT
        *,
        CASE WHEN Orders = 0 THEN 0 ELSE 1 END AS HasOrders
    FROM `Table` ) 
WHERE Month = 1
GROUP BY
  HasOrders
ORDER BY
  Parts

Is there a way execute a query once and have this result ?

HasOrders   Parts   Month
   0         25      1
   1         75      1
   0         45      2
   1         55      2
  ...       ...     ...

CodePudding user response:

Consider below approach

select hasOrders, round(100 * parts, 2) as parts, month from (
  select month, 
    countif(orders = 0) / count(*) `0`,
    countif(orders > 0) / count(*) `1`,
  from your_table
  group by month
)
unpivot (parts for hasOrders in (`0`, `1`))          

with output like below

enter image description here

CodePudding user response:

SELECT
    SIGN(Orders),
    ROUND(COUNT(*) * 100.000 / SUM(COUNT(*), 2) OVER (PARTITION BY Month)) AS Parts,
    Month
FROM T
GROUP BY Month, SIGN(Orders)
ORDER BY Month, SIGN(Orders)

Demo on Postgres: https://dbfiddle.uk/?rdbms=postgres_10&fiddle=4cd2d1455673469c2dfc060eccea8020

You've stated that it's important for the total to be 100% so you might consider rounding down in the case of no orders and rounding up in the case of has orders for those scenarios where the percentages falls precisely on an odd multiple of 0.5%. Or perhaps rounding toward even or round smallest down would be better options:

WITH DATA AS (
    SELECT SIGN(Orders) AS HasOrders, Month,
        COUNT(*) * 10000.000 / SUM(COUNT(*)) OVER (PARTITION BY Month) AS PartsPercent
    FROM T
    GROUP BY Month, SIGN(Orders)
    ORDER BY Month, SIGN(Orders)
)
select HasOrders, Month, PartsPercent,
    PartsPercent - TRUNCATE(PartsPercent) AS Fraction,
    CASE WHEN HasOrders = 0
         THEN FLOOR(PartsPercent) ELSE CEILING(PartsPercent)
    END AS PartsRound0Down,
    CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5
              AND MOD(TRUNCATE(PartsPercent), 2) = 0
         THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
    END AS PartsRoundTowardEven,
    CASE WHEN PartsPercent - TRUNCATE(PartsPercent) = 0.5 AND PartsPercent < 50
         THEN FLOOR(PartsPercent) ELSE ROUND(PartsPercent) -- halfway up
    END AS PartsSmallestTowardZero
from DATA

It's usually not advisable to test floating-point values for equality and I don't know how BigQuery's float64 will work with the comparison against 0.5. One half is nevertheless representable in binary. See these in a case where the breakout is 101 vs 99. I don't have immediate access to BigQuery so be aware that Postgres's rounding behavior is different: https://dbfiddle.uk/?rdbms=postgres_10&fiddle=c8237e272427a0d1114c3d8056a01a09

  • Related