Home > Software design >  Filtering newest row of a column, returning two rows
Filtering newest row of a column, returning two rows

Time:06-23

I have a database with columns like this: enter image description here

I need a sql command that will return the newest version (the second column, version_no) of each study_spec_no that is "Released" and "New". Not every spec no will have a "new" or "released", but for example if there are multiple "released" for a study_spec_no, I'll only get the largest number. And if there are multiple released and multiple news, I'll get the rows with the largest number for each respectively.

I.e. any other status's (principal investigator signoff/financial signoff/etc) will get dropped, and I'll be left with something like this:

enter image description here

I know that if I only wanted the most recent version_no (regardless of status) I could do this:

(this also ensures the notes and descriptions columns aren't both blank, which is something I needed)

SELECT * 
FROM MySchema.MyView WHERE (study_spec_no,version_no) IN 
( SELECT study_spec_no, MAX(version_no)
  FROM MySchema.MyView
  WHERE notes is not null OR version_description is not null
  GROUP BY study_spec_no
)
ORDER BY study_spec_no

Im getting hung up on how I would run that while filtering for both new or released and getting one row of each.

CodePudding user response:

You can use ROW_NUMBER() to identify the rows you want. Then, filtering out rows is easy. For example:

select *
from (
  select t.*,
    row_number() over(
      partition by study_spec_no, status
      order by version_no desc
    ) as rn
  from t
  where status in ('New', 'Released')
) x
where rn = 1

EDIT

If you want to exclude rows using the predicate notes is not null OR version_description is not null you can AND it to the WHERE clause in parenthesis, as in:

  where status in ('New', 'Released')
    and (notes is not null OR version_description is not null)
  •  Tags:  
  • sql
  • Related