I have a database with columns like this:
I need a sql command that will return the newest version (the second column, version_no) of each study_spec_no that is "Released" and "New". Not every spec no will have a "new" or "released", but for example if there are multiple "released" for a study_spec_no, I'll only get the largest number. And if there are multiple released and multiple news, I'll get the rows with the largest number for each respectively.
I.e. any other status's (principal investigator signoff/financial signoff/etc) will get dropped, and I'll be left with something like this:
I know that if I only wanted the most recent version_no (regardless of status) I could do this:
(this also ensures the notes and descriptions columns aren't both blank, which is something I needed)
SELECT *
FROM MySchema.MyView WHERE (study_spec_no,version_no) IN
( SELECT study_spec_no, MAX(version_no)
FROM MySchema.MyView
WHERE notes is not null OR version_description is not null
GROUP BY study_spec_no
)
ORDER BY study_spec_no
Im getting hung up on how I would run that while filtering for both new or released and getting one row of each.
CodePudding user response:
You can use ROW_NUMBER()
to identify the rows you want. Then, filtering out rows is easy. For example:
select *
from (
select t.*,
row_number() over(
partition by study_spec_no, status
order by version_no desc
) as rn
from t
where status in ('New', 'Released')
) x
where rn = 1
EDIT
If you want to exclude rows using the predicate notes is not null OR version_description is not null
you can AND
it to the WHERE
clause in parenthesis, as in:
where status in ('New', 'Released')
and (notes is not null OR version_description is not null)