I have a table which has ID, FAMILY, ENV_XML_PATH and CREATED_DATE columns.
ID | FAMILY | ENV_XML_PATH | CREATED_DATE |
---|---|---|---|
15826841 | CRM | path1.xml | 03-09-22 6:50:34AM |
15826856 | SCM | path3.xml | 03-10-22 7:12:20AM |
15826786 | IC | path4.xml | 02-10-22 12:50:52AM |
15825965 | CRM | path5.xml | 02-10-22 1:50:52AM |
15653951 | null | path6.xml | 04-10-22 12:50:52AM |
15826840 | FIN | path7.xml | 03-10-22 2:34:09AM |
15826841 | SCM | path8.xml | 02-10-22 8:40:52AM |
15223450 | IC | path9.xml | 03-09-22 5:34:09AM |
15026853 | SCM | path10.xml | 05-10-22 4:40:59AM |
Now there are 18 DISTINCT values in FAMILY column and each value has multiple rows associated (as you can see from the above image). What I want is to get the first row of 3 specific values (CRM, SCM and IC) in FAMILY column.
Something like this:
ID | FAMILY | ENV_XML_PATH | CREATED_DATE |
---|---|---|---|
15826841 | CRM | path1.xml | date1 |
15826856 | SCM | path3.xml | date2 |
15826786 | IC | path4.xml | date3 |
I am new to this, though I understand the logic but I am not sure how to implement it. Kindly help. Thanks.
CodePudding user response:
You can use RANK
for that. Something like this:
WITH groupedData AS
(SELECT id, family, env_xml_path, created_date,
RANK () OVER (PARTITION BY family ORDER BY id) AS r_num
FROM yourtable
GROUP BY id, family, env_xml_path, created_date)
SELECT id, family, env_xml_path, created_date
FROM groupedData
WHERE r_num = 1
ORDER BY id;
Thus, within the first query, your data will be grouped by family and sorted by the column you want (in my example, it will be sorted by id).
After that, you will use the second query to only take the first row of each family.
Add a WHERE
clause to the first query if you need to apply further restrictions on the result set.
See here a working example: db<>fiddle
CodePudding user response:
You could use a window function to get to know the row number of each partition in family
ordered by the created_date
, and then filter by the the three families you are interested in:
with row_window as (
select
id,
family,
env_xml_path,
created_date,
row_number() over (partition by family order by created_date asc) as rn
from <your_table>
where family in ('CRM', 'SCM', 'IC')
)
select
id,
family,
env_xml_path,
created_date
from row_window
where rn = 1
Output:
ID | FAMILY | ENV_XML_PATH | CREATED_DATE |
---|---|---|---|
15826841 | CRM | path1.xml | 03-09-22 6:50:34 |
15826856 | SCM | path3.xml | 03-10-22 7:12:20 |
15826786 | IC | path4.xml | 02-10-22 12:50:52 |
CodePudding user response:
The question doesn't really specify what 'first' means, but I assume it means the first to be added in the table, aka the person whose date is the oldest. Try this code:
SELECT DISTINCT * FROM (yourTable) WHERE Family = 'CRM' OR
Family = 'SCM' OR Family = 'IC' ORDER BY Created_Date ASC FETCH FIRST (number) ROWS ONLY;
What it does:
Distinct - It selects different rows, which means you won't get same type of rows at the top.
Where - checks if certain condition is true
OR - it means that the select should choose rows that match those requirements. In the current situation the distinct clause means that same rows won't repeat, so you won't be getting 2 different 'CRM' family names, so it will find the first 'CRM' then the first 'SCM' and so on.
ORDER BY - orders the column in specified order. In the current one, if first rows mean the oldest, then by ordering them by date and using ASC the oldest(aka smallest date) will be at the top.
FETCH FIRST (number) ROWS ONLY - It selects only the very first couple of rows you want. For example if you need 3 different 'first' rows you need to get FETCH FIRST 3 ROWS ONLY. Combined with the distinct word it will only show 3 different rows.