I Have a requirement where I need to get earliest date as start date and If latest date is present then I need to have it as end date, if latest date is blanks which means the person is still active then I need to have it as blanks.
I used Min and Max on date fields but My latest date field is not capturing as Blanks if date is absent.
CodePudding user response:
If you want to get the earliest start_date, by ID. And also bring with whatever is in the End_date field - No matter if it is NULL, or has an date. Then you can first get group by ID(which is not unique in your example given), then use MIN() on start_date. Then you fetch which row these values belong to, and thereby get the End_date. This works, but if you've got several start dates with the same ID, that complicates things - and in that case we need some more example data with a bit mor explanation of how it is supposed to work. But, here goes:
Fiddle: https://www.db-fiddle.com/f/o2NyDpAc76TLYdmGFGHqag/3
CREATE TABLE my_table (
ID int,
Start_Date date,
End_date date null
);
INSERT INTO my_table (ID,Start_Date, End_date)
VALUES
(1,'2021-01-01', '2022-04-05'),
(1,'2022-01-01', '2022-04-02'),
(2,'2022-07-01', '2022-05-07'),
(2,'2022-01-01', null);
SELECT a.*
FROM my_table a
join (SELECT
ID,
MIN(my_table.Start_date) as 'Start_date'
FROM my_table
GROUP BY ID) jn
on a.ID=jn.ID and a.Start_date=jn.Start_date
Source table:
ID | Start_Date | End_date |
---|---|---|
1 | '2021-01-01' | '2022-04-05' |
1 | '2022-01-01' | 2022-04-02 |
2 | '2022-07-01' | '2022-05-07' |
2 | '2022-01-01' | NULL |
Results table:
ID | Start_Date | End_date |
---|---|---|
1 | '2021-01-01' | '2022-04-05' |
2 | '2022-01-01' | NULL |
CodePudding user response:
This might work:
SELECT ID, MIN(start_date) Start_Date,
NULLIF(MAX(COALESCE(end_date,'29991231')), '29991231') End_Date
FROM MyTable
GROUP BY ID
See it work here:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=5febc25e9c79840fe6aa2e55d77cf5d0
At least it will seem to give the right results based on the sample data available. However, this would still show a null
if a record with an earlier start date has a null
end date, and a record with a later start date does have an end date. It's likely this should never happen in real data, but then real data tends to be messy even when it shouldn't be.
To really do this properly, you need to find the whole row with the latest start date and then look at the end date value from that row. Fortunately, we have a great way to count rows: the row_number()
windowing function:
SELECT ID, Start_Date, End_Date
FROM (
SELECT ID, Start_Date, End_Date,
row_number() over (PARTITION BY ID ORDER BY Start_Date DESC) rn
FROM MyTable
) t0
WHERE rn=1
But this is only part of the solution. This should now always have the right End_Date
, but will usually have the wrong Start_Date
. We can update it to fix that error like this:
SELECT ID, (SELECT MIN(Start_Date) FROM MyTable t WHERE t.ID=t0.ID) Start_Date, End_Date
FROM (
SELECT ID, Start_Date, End_Date,
row_number() over (PARTITION BY ID ORDER BY Start_Date DESC) rn
FROM MyTable
) t0
WHERE rn=1
And now we will always get the right result.
See it work here:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=4b7d4cba4849eee9ba3bf978cebfc3bf
Finally, all this assumes you have a reasonable schema using null
and DateTime
values, and not an unreasonable schema using varchar
and empty strings. If the latter really is your situation the schema design really is BROKEN and you should fix it.
This also assumes at least MySql 8.0. If you're using something older than that, condolences. 5.7 and earlier are rooted in basic design from 2006, and don't really qualify as a modern database platform.