Home > Back-end >  Create Statistics Summary on Selected Columns in BigQuery
Create Statistics Summary on Selected Columns in BigQuery

Time:11-07

I'm trying to summarize some statics properties from several numeral columns such as continuous quantiles, mean , standard deviation, etc., then wrap them into rows with original column name attached as an additional column. I know about using AVG, STDDEV_PO, PERCENTILE_CONT... to get them from single column, but didn't find article/recipe about doing them on several columns at once. Any ideas?

Example input:

ID Col1 Col2 Col3
1 1.0 2.0 4.0
2 2.0 4.0 8.0
3 3.0 6.0 12.0
4 4.0 8.0 16.0

Expected output:

Col Name Q1 Q2 Q3 Mean Std
Col1 1.75 2.5 3.25 2.5 1.12
Col2 3.5 5.0 6.5 5.0 2.24
Col3 7.0 10.0 13.0 10.0 4.47

Or 'transposed' version:

Stats Col1 Col2 Col3
Q1 1.75 3.5 7.0
Q2 2.5 5.0 10.0
Q3 3.25 6.5 13.0
Mean 2.5 5.0 10.0
Std 1.12 2.24 4.47

CodePudding user response:

Consider below

select distinct col, 
  percentile_cont(value, 0.25) over win as q1,
  percentile_cont(value, 0.50) over win as q2,
  percentile_cont(value, 0.75) over win as q3,
  avg(value) over win as avg, 
  stddev_pop(value) over win as std, 
from your_table
unpivot (value for col in (col1, col2, col3))
window win as (partition by col)                 

if applied to sample data in your question - output is

enter image description here

To get the 'transposed' version - use below

select * from (
  select * from (
    select distinct col, 
      percentile_cont(value, 0.25) over win as q1,
      percentile_cont(value, 0.50) over win as q2,
      percentile_cont(value, 0.75) over win as q3,
      avg(value) over win as avg, 
      stddev_pop(value) over win as std, 
    from data
    unpivot (value for col in (col1, col2, col3))
    window win as (partition by col)
  ) unpivot (value for stats in (q1, q2, q3, avg, std))
) pivot (any_value(value) for col in ('col1', 'col2', 'col3'))

in this case - output is

enter image description here

  • Related