How to get first row of 3 specific values of a column using Oracle SQL?-CodePudding

I have a table which has ID, FAMILY, ENV_XML_PATH and CREATED_DATE columns.

ID	FAMILY	ENV_XML_PATH	CREATED_DATE
15826841	CRM	path1.xml	03-09-22 6:50:34AM
15826856	SCM	path3.xml	03-10-22 7:12:20AM
15826786	IC	path4.xml	02-10-22 12:50:52AM
15825965	CRM	path5.xml	02-10-22 1:50:52AM
15653951	null	path6.xml	04-10-22 12:50:52AM
15826840	FIN	path7.xml	03-10-22 2:34:09AM
15826841	SCM	path8.xml	02-10-22 8:40:52AM
15223450	IC	path9.xml	03-09-22 5:34:09AM
15026853	SCM	path10.xml	05-10-22 4:40:59AM

Now there are 18 DISTINCT values in FAMILY column and each value has multiple rows associated (as you can see from the above image). What I want is to get the first row of 3 specific values (CRM, SCM and IC) in FAMILY column.

Something like this:

ID	FAMILY	ENV_XML_PATH	CREATED_DATE
15826841	CRM	path1.xml	date1
15826856	SCM	path3.xml	date2
15826786	IC	path4.xml	date3

I am new to this, though I understand the logic but I am not sure how to implement it. Kindly help. Thanks.

CodePudding user response：

You can use RANK for that. Something like this:

WITH  groupedData  AS
(SELECT id, family, env_xml_path, created_date,
RANK () OVER (PARTITION BY family ORDER BY id) AS r_num
FROM      yourtable    
GROUP BY  id, family, env_xml_path, created_date)

SELECT    id, family, env_xml_path, created_date
FROM      groupedData
WHERE     r_num  = 1
ORDER BY  id;

Thus, within the first query, your data will be grouped by family and sorted by the column you want (in my example, it will be sorted by id).

After that, you will use the second query to only take the first row of each family.

Add a WHERE clause to the first query if you need to apply further restrictions on the result set.

See here a working example: db<>fiddle

CodePudding user response：

You could use a window function to get to know the row number of each partition in family ordered by the created_date, and then filter by the the three families you are interested in:

with row_window as (
  select
    id,
    family,
    env_xml_path,
    created_date,
    row_number() over (partition by family order by created_date asc) as rn
  from <your_table>
  where family in ('CRM', 'SCM', 'IC')
)

select 
  id,
  family,
  env_xml_path,
  created_date
from row_window
where rn = 1

Output:

ID	FAMILY	ENV_XML_PATH	CREATED_DATE
15826841	CRM	path1.xml	03-09-22 6:50:34
15826856	SCM	path3.xml	03-10-22 7:12:20
15826786	IC	path4.xml	02-10-22 12:50:52

CodePudding user response：

The question doesn't really specify what 'first' means, but I assume it means the first to be added in the table, aka the person whose date is the oldest. Try this code:

SELECT DISTINCT * FROM (yourTable) WHERE Family = 'CRM' OR 
Family = 'SCM' OR Family = 'IC' ORDER BY Created_Date ASC FETCH FIRST (number) ROWS ONLY;

What it does:

Distinct - It selects different rows, which means you won't get same type of rows at the top.

Where - checks if certain condition is true

OR - it means that the select should choose rows that match those requirements. In the current situation the distinct clause means that same rows won't repeat, so you won't be getting 2 different 'CRM' family names, so it will find the first 'CRM' then the first 'SCM' and so on.

ORDER BY - orders the column in specified order. In the current one, if first rows mean the oldest, then by ordering them by date and using ASC the oldest(aka smallest date) will be at the top.

FETCH FIRST (number) ROWS ONLY - It selects only the very first couple of rows you want. For example if you need 3 different 'first' rows you need to get FETCH FIRST 3 ROWS ONLY. Combined with the distinct word it will only show 3 different rows.