I am able to retrieve all the keys and values from json files stored in a Azure Data Lake Storage Gen 2 directory through an Azure Synapse Serverless SQL query like the one below:
SELECT TOP 100
someColumn1,
someColumn2
FROM OPENROWSET(
BULK 'https://********.dfs.core.windows.net/some/path/**',
FORMAT = 'csv',
FIELDTERMINATOR ='0x0b',
FIELDQUOTE = '0x0b',
ROWTERMINATOR = '0x0b'
) WITH (doc NVARCHAR(MAX)) AS ROWS
OUTER APPLY OPENJSON ( [doc] )
WITH (
[someColumn1] INT '$.someProperty1',
[someColumn2] INT '$.someProperty2'
) AS someData
I am looking to retrieve the path/url of the JSON files for parsing as extra columns returned by my SQL query. Is this achievable?
CodePudding user response:
Synapse provides the filepath
function that returns a full or partial path:
- The whole file path from which the row was retrieved is returned when the function is invoked without an argument. DATA SOURCE returns path relative to DATA SOURCE when used in OPENROWSET.
- A portion of the route that matches the wildcard on the point indicated in the argument is returned when the function is invoked with a parameter. The path's first wildcard-matching portion, for instance, would be returned if parameter value 1 were used.
Yes, you can get file path in synapse with the additional columns along with query file.
Example:
SELECT TOP 100
name,salary,ROWS.filepath() AS filepath
FROM
OPENROWSET(
BULK 'https://dlsg2p.dfs.core.windows.net/fsn2p/userdetails.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b',
ROWTERMINATOR = '0x0b'=
)
WITH (doc NVARCHAR(MAX)) AS ROWS
OUTER APPLY OPENJSON ( [doc] )
WITH (
[name] VARCHAR '$.name',
[salary] INT '$.salary'
) AS someData
Execution and Output:
Reference: file Metadata