Home > front end >  Snowflake/SQL Date Time - Weird Format
Snowflake/SQL Date Time - Weird Format

Time:02-05

I have a dataset that has a long format and I need to update the data to a normal date type in Snowflake. The data in the () looks to change slightly too. Does anybody know how update the string with two examples below to a date type?

  • Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)
  • Fri Jan 22 2021 10:43:33 GMT-0500 (EST)
  • Wed Nov 25 2020 19:56:46 GMT 0400 ( 04)
  • Sat Aug 19 2017 01:23:39 GMT 0530 (India Standard Time)

CodePudding user response:

So this decomposed set of steps shows one way to handle these:

SELECT 
    column1
    ,split(column1, ' ') as parts
    ,ARRAY_SLICE(parts,1,6) as sub_parts
    ,ARRAY_TO_STRING(sub_parts, ' ') as new_str
    ,TRY_TO_TIMESTAMP_TZ(new_str, 'Mon DD YYYY HH24:MI:SS GMTTZHTZM')
FROM VALUES 
     ('Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)')
    ,('Fri Jan 22 2021 10:43:33 GMT-0500 (EST)')
    ,('Wed Nov 25 2020 19:56:46 GMT 0400 ( 04)')
    ,('Sat Aug 19 2017 01:23:39 GMT 0530 (India Standard Time)')
;

gives:

COLUMN1 PARTS SUB_PARTS NEW_STR ANSWER
Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time) [ "Sat", "May", "16", "2020", "11:12:33", "GMT-0400", "(Eastern", "Daylight", "Time)" ] [ "May", "16", "2020", "11:12:33", "GMT-0400" ] May 16 2020 11:12:33 GMT-0400 2020-05-16 11:12:33.000 -0400
Fri Jan 22 2021 10:43:33 GMT-0500 (EST) [ "Fri", "Jan", "22", "2021", "10:43:33", "GMT-0500", "(EST)" ] [ "Jan", "22", "2021", "10:43:33", "GMT-0500" ] Jan 22 2021 10:43:33 GMT-0500 2021-01-22 10:43:33.000 -0500
Wed Nov 25 2020 19:56:46 GMT 0400 ( 04) [ "Wed", "Nov", "25", "2020", "19:56:46", "GMT 0400", "( 04)" ] [ "Nov", "25", "2020", "19:56:46", "GMT 0400" ] Nov 25 2020 19:56:46 GMT 0400 2020-11-25 19:56:46.000 0400
Sat Aug 19 2017 01:23:39 GMT 0530 (India Standard Time) [ "Sat", "Aug", "19", "2017", "01:23:39", "GMT 0530", "(India", "Standard", "Time)" ] [ "Aug", "19", "2017", "01:23:39", "GMT 0530" ] Aug 19 2017 01:23:39 GMT 0530 2017-08-19 01:23:39.000 0530

smashed all together:

SELECT 
    column1
    ,TRY_TO_TIMESTAMP_TZ(ARRAY_TO_STRING(ARRAY_SLICE(split(column1, ' '),1,6), ' '), 'Mon DD YYYY HH24:MI:SS GMTTZHTZM') as answer
FROM VALUES 
     ('Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)')
    ,('Fri Jan 22 2021 10:43:33 GMT-0500 (EST)')
    ,('Wed Nov 25 2020 19:56:46 GMT 0400 ( 04)')
    ,('Sat Aug 19 2017 01:23:39 GMT 0530 (India Standard Time)')
;

gives:

COLUMN1 ANSWER
Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time) 2020-05-16 11:12:33.000 -0400
Fri Jan 22 2021 10:43:33 GMT-0500 (EST) 2021-01-22 10:43:33.000 -0500
Wed Nov 25 2020 19:56:46 GMT 0400 ( 04) 2020-11-25 19:56:46.000 0400
Sat Aug 19 2017 01:23:39 GMT 0530 (India Standard Time) 2017-08-19 01:23:39.000 0530

Now in the sub_parts step I skip the day of the week, and stop before the timezone, the former can be handled but the latter cannot.

wait a second, the input is constant width, thus we can constant width SUBSTR stepping:

,TRY_TO_TIMESTAMP_TZ(substr(column1, 5, 29), 'Mon DD YYYY HH24:MI:SS GMTTZHTZM') as answer  

CodePudding user response:

Using regular expressions:

SELECT 
    column1
    ,TRY_TO_TIMESTAMP_TZ(
        regexp_replace(column1, '(.*) GMT([^(]*) \(.*\)', '\\1 \\2'), 
        'DY Mon DD YYYY HH24:MI:SS TZHTZM'
    ) as timestamp
FROM VALUES 
     ('Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)')
    ,('Fri Jan 22 2021 10:43:33 GMT-0500 (EST)')
    ,('Wed Nov 25 2020 19:56:46 GMT 0400 ( 04)')
    ,('Sat Aug 19 2017 01:23:39 GMT 0530 (India Standard Time)')
;

 ----------------------------------------------------------- ------------------------------- 
| COLUMN1                                                   | TIMESTAMP                     |
|----------------------------------------------------------- -------------------------------|
| Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time) | 2020-05-16 11:12:33.000 -0400 |
| Fri Jan 22 2021 10:43:33 GMT-0500 (EST)                   | 2021-01-22 10:43:33.000 -0500 |
| Wed Nov 25 2020 19:56:46 GMT 0400 ( 04)                   | 2020-11-25 19:56:46.000  0400 |
| Sat Aug 19 2017 01:23:39 GMT 0530 (India Standard Time)   | 2017-08-19 01:23:39.000  0530 |
 ----------------------------------------------------------- ------------------------------- 
  •  Tags:  
  • Related