I'm brainstorming on ways to find trends over a dataset containing transaction amounts that spans a year.
I'd like to run an average of top 25% observations of data and bottom 75% observations of data and viceversa.
If the entire dataset contains 1000 observations, I'd like to run:
- An average of the top 25% and then separately, an average of the bottom 75% and find the resulting average of this.
- Inversely, top 75% average, then bottom 25%, then the average of the 2.
For the overall average I have: avg(transaction_amount)
I am aware that in order for the sectioning averages to be useful, I will have to order the data according to the date which I already have accounted for in my SQL code:
select avg(transaction_amount)
from example.table
order by transaction_date
I am now struggling to find a way to split the data between 25% and 75% based on the number of observations.
Thanks.
CodePudding user response:
If you're using MSSQL, it's pretty trivial depending on exactly the output you're looking for.
SELECT TOP 25 PERCENT
*
FROM (
SELECT
AVG(transaction_amount) as avg_amt
FROM example.table
) AS sub
ORDER BY sub.avg_amt DESC
CodePudding user response:
Use PERCENT_RANK
in order to see which percentage block a row belongs to. Then use this to group your data:
with data as
(
select t.*, percent_rank() over (order by transaction_amount) as pr
from example.table t
)
select
case when pr <= 0.75 then '0-75%' else '75-100%' end as percent,
avg(transaction_amount) as avg,
avg(avg(transaction_amount)) over () as avg_of_avg
from data
group by case when pr <= 0.75 then '0-75%' else '75-100%' end
union all
select
case when pr <= 0.25 then '0-25%' else '25-100%' end as percent,
avg(transaction_amount) as avg,
avg(avg(transaction_amount)) over () as avg_of_avg
from data
case when pr <= 0.25 then '0-25%' else '25-100%' end;