Home > Blockchain >  How to check if a value exists in either of two columns and get average of another column sql(bigque
How to check if a value exists in either of two columns and get average of another column sql(bigque

Time:09-30

I have data in the below format, basically pairwise distance score of two string names:

Table A

name1   name2    score    
xxx     xxa      0.4    
xxb     xxc      0.2     
xxd     xxa      0.6  
xxa     zzz      0.8  

I need the output in below format for e.g. for each name present in either in name1 or name2, taking the average of score by unique name in two columns. In the above table, xxa has score in 3 rows, so we take average and get a single score of 0.6

Output Table

name score

xxx  0.4    
xxa  0.6    
xxb  0.2    
xxc  0.2    
zzz  0.8

I am using the below query to get the result, it works for small data in bigquery, but my input table tableA is 900Million in size and the query just doesn't finish

with all_names as (
select distinct name1 as name from 
tableA
union distinct 
select distinct name2 as name from 
tableA )

select name,avg(score) as avg_score
from tableA as a, all_names as b
where (b.name=a.name1 or b.name=a.name2)
group by 1

Is there any better efficient way to perform this operation?

CodePudding user response:

You might try below query:

SELECT name, ROUND(AVG(score), 1) avg 
  FROM tableA, UNNEST([name1, name2]) name
 GROUP BY 1;

 ------ ----- 
| name | avg |
 ------ ----- 
| xxa  | 0.6 |
| xxb  | 0.2 |
| xxc  | 0.2 |
| xxd  | 0.6 |
| xxx  | 0.4 |
| zzz  | 0.8 |
 ------ ----- 

Or, if you prefer to use UNION,

SELECT name, ROUND(AVG(score), 1) avg 
  FROM (
    SELECT name1 AS name, score FROM tableA
     UNION ALL
    SELECT name2 AS name, score FROM tableA
  )
 GROUP BY 1;

CodePudding user response:

Another option to consider

select name, avg(score) avg_score
from your_table
unpivot(name for col in (name1, name2))
group by name            

if applied to sample data in your question - output is

enter image description here

  • Related