Home > Software engineering >  Retrieve only the first row per student
Retrieve only the first row per student

Time:09-25

Suppose I have a table which stores data for students and the respected grades per classes. In most cases, there exists multiple rows per student in the table.

student_id           math           chemistry        science
 100                  A                B                C              <--------
 100                  B                A                D
 100                  D                F                C
 200                  B                A                C              <--------
 300                  C                D                F              <--------
 300                  A                A                A              
 400                  F                C                B              <--------
 400                  B                A                C              
 500                  A                B                A              <--------

I want to retrieve the first row as per the student_id as explained below.

Requested in postgreSQL:

student_id           math           chemistry        science
 100                  A                B                C
 200                  B                A                C
 300                  C                D                F
 400                  F                C                B
 500                  A                B                A 

CodePudding user response:

This can be achieved with the following query:

SELECT DISTINCT ON (student_id) student_id, math, chemistry, science
FROM
   student
ORDER BY
   student_id

This query will return just a single row per student_id. You should determine how you want to actually order it (such as an index or timestamp), otherwise you can't guarantee that you get the same output each time.

But as a basic solution, you can use this to just get a single row if you don't care about the actual order and only care about removing duplicates.

CodePudding user response:

At the point you decide on an ordering strategy (such as adding a column that records the date they took the exams and defining "first" as "most recent") you can get the "first" row with something like

SELECT * 
FROM(
  SELECT *, ROW_NUMBER() OVER(PARTITION BY student_id ORDER BY date_taken DESC) rn 
  FROM yourtable 
) x
WHERE rn = 1

The ROW_NUMBER creates an incrementing counter 1,2,3,4.. that restarts from 1 for every different student id, and the rows are numbered in order of descending date (so most recent gets 1). By then requiring rn to be 1 we get the most recent row

You might decide to give the student their best marks, and maybe we could use the ascii value of the score (A is 65, B is 66 etc). If we add the scores up then the lowest total (ie order by ascending) is the best set of marks (BBB is better than AAF)

OVER(PARTITION BY student_id ORDER BY ASCII(math) ASCII(chemistry) ASCII(science))

CodePudding user response:

The basic solution is - using MIN/MAX function to aggregate values:

SELECT student_id, MAX(math) math, MAX(chemistry) chemistry, MAX(science) science
FROM
   student
GROUP BY
   student_id
  • Related