I have data coming from 3 separate entities that include dates of birth. Some of the data has the date as 2001/12/02, some has it as 20011202, and some has it as 12/02/2001. Is there a way to normalize this data so that it is all the same format?
CodePudding user response:
Assuming your desired output is December 2, 2001 ... try_convert()
should do the trick
I should add that try_convert()
will return a NULL
if the conversion fails, rather than throwing an error.
Select try_convert(date,'2001/12/02')
,try_convert(date,'20011202')
,try_convert(date,'12/02/2001')
Results
(No column name) (No column name) (No column name)
2001-12-02 2001-12-02 2001-12-02
CodePudding user response:
So let's say you have a table with a single column, and it has some inconsistent data in it:
CREATE TABLE #InconsistentFormats
(
RegionalDate varchar(20)
);
INSERT #InconsistentFormats(RegionalDate)
VALUES('2001/12/02'), ('20011202'), ('12/02/2001');
Assuming you know the formats are limited to yyyy/MM/dd
, yyyyMMdd
, and mm/DD/yyyy
, you can do this by just wrapping TRY_CONVERT()
- with explicit conversion style numbers - inside COALESCE
:
SELECT TheRealDate = COALESCE
(
TRY_CONVERT(date, RegionalDate, 111), -- yyyy/MM/dd
TRY_CONVERT(date, RegionalDate, 108), -- yyyyMMdd
TRY_CONVERT(date, RegionalDate, 101) -- MM/dd/yyyy
)
FROM #InconsistentFormats;
- Example db<>fiddle
Without the style numbers, watch what happens when you have, say, a different language:
SET LANGUAGE British;
SELECT TheRealDate = TRY_CONVERT(date, RegionalDate)
FROM #InconsistentFormats;
One of the dates gets transposed (and would become NULL if it was a day after the 12th).
Some might think that storing February 12th (or NULL) instead of the expected December 2nd is better than an exception, but I don't think everyone would agree. While some people might have great data hygiene processes that will find that NULL or flipped date immediately, many people don't, and they could be losing or corrupting data for 12 days before they realize it.