Home > Software design >  Google BigQuery: How to filter out rows by a particular column's value frequency
Google BigQuery: How to filter out rows by a particular column's value frequency

Time:06-19

Say that I only want to return rows where a column value occurs at least twice.

I would do something like

SELECT 
table1.columnA
from table1
GROUP BY 
table1.columnA
HAVING COUNT(*) > 2

That works for just one column, but if I want to do return several columns but only have the filter apply to one column, it doesn't work. My attempt is

SELECT 
table1.columnA,
table1.columnB,
from table1
GROUP BY 
table1.columnA
HAVING COUNT(*) > 2

Which gives a "ColumnB which is neither GROUPED nor AGGREGATED " error.

From this post, it seems that I need to have all values in SELECT to be grouped or aggregated, but I only one to filter by one particular column

BIGQUERY SELECT list expression references column CHANNEL_ID which is neither grouped nor aggregated at [10:13]

So I'm still trying to figure out a way to filter by value frequency for a particular column.

CodePudding user response:

te problem is as always if you only group by columnA and there are multiple values for every row in te grouped columA, you need to choose which you want of the columnB

MIN(colunsB) would take the smalest row in the group of columA. 

it would return so for every row in te grouped columa only one row of columnb, the smallest one.

in case that every row in columna has only one row in columnb

coluanA columnB
ab      cd
ab      cd
ab1     cd1
ab1     cd1
ab1     cd1

you should make GROUP BY columnA,colmunB as it will only return 1 row

Te basic idea of GRouping is that you want an aggreagtion of the rest of the columns

CodePudding user response:

You can use window function to count frequency and then filter. For example:

select distinct 
    columnA,
    columnB
from
    (select 
        *,
        row_number() over(partition by columnA) as rn 
     from table1)
where rn > 2

Let me know, if it is still not working for you.

  • Related