Home > Enterprise >  PostgreSQL percentile
PostgreSQL percentile

Time:10-30

Let's say I have two tables called schools and students which have a one-to-many relationship. Each student can only be in one school but each school can have many students.

school
-----------
id
name


student
-----------
id
name
school_id

I need to find the 90th percentile of the number of students each school has.

I can already sort the schools based on the student count, but I don't know how to get the percentile.

select school_id, count(id) as count from students
group by school_id
order by count desc

CodePudding user response:

maybe this helps:

select school_id,stud_count, PERCENT_RANK() OVER(partition by school_id ORDER BY stud_count)
from(
select school_id, count(*) as stud_count
from students s
LEFT JOIN school sc ON s.school_id = sc.id
group by school_id
order by count desc)x

CodePudding user response:

I believe you need the percentile_disc function? Here's an example:

with
  -- The following hard-coded data set simulates:
  --
  --   select count(*) from students group by school_id
  --
  students_per_school(student_count) as (values
    (1), --  10% = 0.1
    (1), --  20% = 0.2
    (1), --  30% = 0.3
    (1), --  40% = 0.4
    (2), --  50% = 0.5
    (3), --  60% = 0.6
    (4), --  70% = 0.7
    (5), --  80% = 0.8
    (6), --  90% = 0.9
    (7)  -- 100% = 1.0
  )

  select
    percentile_disc(0.1) within group (order by student_count) as " 10th percentile",
    percentile_disc(0.2) within group (order by student_count) as " 20th percentile",
    percentile_disc(0.3) within group (order by student_count) as " 30th percentile",
    percentile_disc(0.4) within group (order by student_count) as " 40th percentile",
    percentile_disc(0.5) within group (order by student_count) as " 50th percentile",
    percentile_disc(0.6) within group (order by student_count) as " 60th percentile",
    percentile_disc(0.7) within group (order by student_count) as " 70th percentile",
    percentile_disc(0.8) within group (order by student_count) as " 80th percentile",
    percentile_disc(0.9) within group (order by student_count) as " 90th percentile",
    percentile_disc(1.0) within group (order by student_count) as "100th percentile"
  from students_per_school;

The above will output (using extended output: \x on in psql):

┌─[ RECORD 1 ]─────┬───┐
│  10th percentile │ 1 │
│  20th percentile │ 1 │
│  30th percentile │ 1 │
│  40th percentile │ 1 │
│  50th percentile │ 2 │
│  60th percentile │ 3 │
│  70th percentile │ 4 │
│  80th percentile │ 5 │
│  90th percentile │ 6 │
│ 100th percentile │ 7 │
└──────────────────┴───┘

I've also created a fiddle which queries an actual table: https://dbfiddle.uk/vNl-rHdn

  • Related