Home > Enterprise >  BigQuery NOT EXISTS... NOT EQUAL TO (!=) and EXISTS... EQUAL TO Giving Different Results
BigQuery NOT EXISTS... NOT EQUAL TO (!=) and EXISTS... EQUAL TO Giving Different Results

Time:10-31

I am lost and trying to figure out the reason I am getting different outputs for the same logic. I want to find out all calls (variants and non-variants) for each call set, and omits any call with a non-PASS filter.

The query is given at the link:

#standardSQL
SELECT
  call.name AS call_name,
  COUNT(1) AS number_of_calls
FROM
  `bigquery-public-data.human_genome_variants.platinum_genomes_deepvariant_variants_20180823` v, v.call
WHERE
  NOT EXISTS (SELECT 1 FROM UNNEST(call.FILTER) AS call_filter WHERE call_filter != 'PASS')
GROUP BY
  call_name
ORDER BY
  call_name
Row call_name number_of_calls
1 NA12877 29795946
2 NA12878 26118774
3 NA12889 29044992
4 NA12890 28717437
5 NA12891 31395995
6 NA12892 25349974

This returns the number of rows that have filter as PASS.

But, when I try a similar logic I get a different result.

#standardSQL
SELECT
  call.name AS call_name,
  COUNT(1) AS number_of_calls
FROM
  `bigquery-public-data.human_genome_variants.platinum_genomes_deepvariant_variants_20180823` v, v.call
WHERE
   EXISTS (SELECT 1 FROM UNNEST(call.FILTER) AS call_filter WHERE call_filter = 'PASS')
GROUP BY
  call_name
ORDER BY
  call_name
Row call_name number_of_calls
1 NA12877 4488086
2 NA12878 4503443
3 NA12889 4423974
4 NA12890 4529950
5 NA12891 4425316
6 NA12892 4497085

Why is this? Is my logic wrong? Can someone explain why empty array is a "PASS" in this case? Thank you for taking the time to help me!

CodePudding user response:

where NOT EXISTS (SELECT 1 FROM UNNEST(call.FILTER) AS call_filter WHERE call_filter != 'PASS')

and

where EXISTS (SELECT 1 FROM UNNEST(call.FILTER) AS call_filter WHERE call_filter = 'PASS')

Are not the same. First one just ignores columns with all filters not pass, means resuts will be "all passed" "some passed".

And second one just returns "all passed" results.

CodePudding user response:

Thank you @shawnt00 for providing the solution. I will demonstrate with a few smaller examples that the extra counts are indeed empty arrays.
Query 1

with smalltable as (
    select ["PASS"] as filter, 'NA12877' as name union all 
    select ["PASS"], 'NA12877' union all 
    select ["PASS"], 'NA12879' union all 
    select ["PASS",'RefCall'], 'NA12877' union all 
    select ["RefCall"], 'NA12877' union all 
    select ["RefCall"], 'NA12877' union all 
    select ["RefCall"], 'NA12879' union all 
    select ["RefCall"], 'NA12879' union all 
    select ["RefCall"], 'NA12879' union all 
    select ["RefCall"], 'NA12879', union all
    select [], 'NA12879'   
)
select name, count(1) as count 
from smalltable 
where not exists (select 1 from unnest(filter) as f where f != 'PASS')
group by name

Output:

Row name count
1 NA12877 2
2 NA12879 2

As we can see the query considers [] as 'PASS'. I am not sure why?

Query 2

with smalltable as (
    select ["PASS"] as filter, 'NA12877' as name union all 
    select ["PASS"], 'NA12877' union all 
    select ["PASS"], 'NA12879' union all 
    select ["PASS",'RefCall'], 'NA12877' union all 
    select ["RefCall"], 'NA12877' union all 
    select ["RefCall"], 'NA12877' union all 
    select ["RefCall"], 'NA12879' union all 
    select ["RefCall"], 'NA12879' union all 
    select ["RefCall"], 'NA12879' union all 
    select ["RefCall"], 'NA12879', union all
    select [], 'NA12879'   
)
select name, count(1) as count 
from smalltable 
where exists (select 1 from unnest(filter) as f where f = 'PASS')
group by name

Output:

Row name count
1 NA12877 3
2 NA12879 1

This as expected does not count the empty array.

  • Related