Home > Enterprise >  Find the range where each row falls within on a Bigquery table
Find the range where each row falls within on a Bigquery table

Time:08-10

I have a lookup table like below, which helps add a description to total sales made by a company.

limit description
99 tens
999 hundreds
999,999 thousands
999,999,999 millions
999,999,999,999 billions

For each row in my sales table, I would like to lookup the total sales and find the corresponding description (for example, if a company made sales worth $1,234,567 the description should be 'millions'. If a company made sales worth $999,999 the description should be thousands)

WITH lookup_table AS (SELECT limit, description FROM projectid.dataset.lookup)

SELECT total_sales, --logic-to-find-description AS description
FROM projectid.dataset.sales

Sample output

total_sales description
89 tens
45,879 thousands
12,254,785 millions

What would be the best way to achieve this.

CodePudding user response:

Try this:

with lookup_table as
  (
              select  99 as `limit`, 'tens' as description
    union all select  999, 'hundreds'
    union all select  999999, 'thousands'
    union all select  999999999, 'millions'
    union all select  999999999999, 'billions'
  ),
  sales_table as
  (
              select 89 as total_sales
    union all select 45879
    union all select 12254785
    union all select 999999
  )
select  t.total_sales,
        (
          select  tt.description
          from    lookup_table tt
          where   t.total_sales <= tt.`limit`
          order by tt.`limit`
          limit 1
        ) as description
from    sales_table t
;

Or this:

with lookup_table as
  (
              select  99 as `limit`, 'tens' as description
    union all select  999, 'hundreds'
    union all select  999999, 'thousands'
    union all select  999999999, 'millions'
    union all select  999999999999, 'billions'
  ),
  sales_table as
  (
              select 89 as total_sales
    union all select 45879
    union all select 12254785
    union all select 999999
  )
select  t.total_sales,
        t2.description,
from    sales_table t
left join (
            select  tt.`limit`,
                    tt.description,
                    ifnull(lag(tt.`limit`) over(order by tt.`limit`) 1,0) as prev_value
            from lookup_table tt
          ) t2 on t.total_sales between t2.prev_value and t2.`limit`;

Choose the one, which will be faster)

CodePudding user response:

Consdier using RANGE_BUCKET function,

WITH lookup_table  AS (
  SELECT ANY_VALUE(description) description, ARRAY_AGG(`limit` ORDER BY `limit`) limits
    FROM UNNEST([1, 2, 3, 6, 9]) n, UNNEST([CAST(POW(10, n) AS INT64)]) `limit`,
         UNNEST([STRUCT(['tens', 'hundreds', 'thousands', 'millions', 'billions'] AS description)])
),
sales AS (
  SELECT * FROM UNNEST([9, 89, 99, 100, 1000, 45879, 12254785]) total_sales
)
SELECT total_sales,
       description[SAFE_OFFSET(RANGE_BUCKET(total_sales, limits) - 1)] AS description
  FROM sales, lookup_table;
 ----- ------------- ------------- 
| Row | total_sales | description |
 ----- ------------- ------------- 
|   1 |           9 | null        |
|   2 |          89 | tens        |
|   3 |          99 | tens        |
|   4 |         100 | hundreds    |
|   5 |        1000 | thousands   |
|   6 |       45879 | thousands   |
|   7 |    12254785 | millions    |
 ----- ------------- ------------- 
  • Related