Home > Blockchain >  How to count an average for duplicates based on two other columns
How to count an average for duplicates based on two other columns

Time:10-22

I have the following structure of data:

average | date          | hash
20      | 2017-08-29    | 9xxx
15      | 2017-08-29    | 9xxx
26      | 2017-08-29    | 1as0
24      | 2017-09-01    | 1as0
21      | 2017-09-01    | 1as0
22      | 2017-12-20    | 9xxx
21      | 2017-12-20    | 1as0
24      | 2017-12-20    | 1as0
...     | ...           | ...

Columns date & hash have duplicates.

I'm trying to build a SQL query that will help me to get rid of duplicates (hash & date). However, I need to accumulate all the values from average from duplicates for the final average per date and per hash.

In other words, I'd like to receive:

  1. Average for hash based on all duplicated values (from hash column)
  2. Per date

So it should look like:

average | date          | hash
17. 5   | 201-08-29     | 9xxx
26      | 2017-08-29    | 1as0
22.5    | 2017-09-01    | 1as0
22      | 2017-12-20    | 9xxx
22.5    | 2017-12-20    | 1as0
...     | ...           | ...

It looks rather simple, but I'm new to SQL, so it might be not so obvious to me. I spend many hours googling and I didn't find a solution that I was able to implement with a success.

For now, I tried some queries with DISTINCT, GROUP BY and inner query. Each time I miss something (like an average of some subgroups per date for instance).

My last idea was to:

  1. To create a separate column with the average for each hash and then...
  2. ... to group values per date

But in such a solution, the average is done for all hashes not per hash subgroups (duplicates). I'm not able to grasp all values per each date (year/month/day).

SELECT *, AVG(avg_tmpr_c) OVER (PARTITION BY wthr_date) as av_for_hash
FROM testing

Anyone will be so kind to help me?

CodePudding user response:

select "date", hash, avg(average) as avg
from tbl
group by "date", hash

CodePudding user response:

First we collect the total of average per hash and date. This could be saved in a temporary table. Then we add them and we compute the last average.

select date, hash , sum(average ) sum_a , count(*) count_a from tbl group by date, hash

--- per date

select date , sum(sum_a) / sum(count_a) as date_avg from 
( select date, hash , sum(average ) sum_a , count(*) count_a 
from tbl 
group by date, hash ) a
group by date

--- per hash

select hash , sum(sum_a) / sum(count_a) as hash_avg from 
( select date, hash , sum(average ) sum_a , count(*) count_a 
from tbl 
group by date, hash ) a
group by hash 
  •  Tags:  
  • sql
  • Related