Openrowset with bulk option cannot retrieve records due to an invalid type DATE-CodePudding

When trying to read a parquet file with the openrowset and bulk options I encounter the following error:

Error handling external file: 'Inserting value to batch for column type DATE failed. Invalid argument provided.'

This problem indicates that the file couldn't be read because the data contains values that are not DATE type - indeed, there are undefined and 0001-01-01 records, so when the BULK operation is performed these values prevent from loading the entire parquet file (see the code from below).

SELECT
TOP 100 
POST_SEARCH_DATE,
SEARCH_DATE 

FROM
OPENROWSET(
    BULK 'https://test/refined-parquet/data/v1.0/loaddt=2022-01-01/**',
    FORMAT = 'PARQUET'
    
) AS [result]

One alternative I thought was to convert the column to a varchar to bypass the error but unfortunately, that didn't make any difference (see below).

SELECT
TOP 100 
POST_SEARCH_DATE,
SEARCH_DATE

FROM
OPENROWSET(
    BULK 'https://test/refined-parquet/data/v1.0/loaddt=2022-01-01/**',
    FORMAT = 'PARQUET'
    
) AS [result]

WHERE cast(POST_SEARCH_DEPARTURE_DATE as varchar(100))!= 'undefined'

and cast (SEARCH_DEPARTURE_DATE as varchar(100)) != 'undefined'
and cast (SEARCH_DEPARTURE_DATE as varchar(100)) != '0001-01-01'
and cast(POST_SEARCH_DEPARTURE_DATE as varchar(100))!= '0001-01-01'

I found some suggestions about inserting every record into a temporary table and from there you can disregard the records that fail. Can someone help me to give me more details about this solution or another alternative?

CodePudding user response：

Have you tried using WITH to define the data types, while using OPENROWSET?

SELECT
  TOP 100 
  POST_SEARCH_DATE,
  SEARCH_DATE,
  EXAMPLE_COL


FROM
OPENROWSET(
    BULK 'https://test/refined-parquet/data/v1.0/loaddt=2022-01-01/**',
    FORMAT = 'PARQUET'
    WITH (
        [SEARCH_DEPARTURE_DATE] VARCHAR(100) COLLATE Latin1_General_BIN2 3,
        [POST_SEARCH_DEPARTURE_DATE] VARCHAR(100) COLLATE Latin1_General_BIN2 2,
        [EXAMPLE_COL] VARCHAR(100)
) AS [r]