Home > other >  Pipeline keeps failing in Azure Synapse (running a simple query)
Pipeline keeps failing in Azure Synapse (running a simple query)

Time:12-29

I am learning how to create pipelines with the help of a Udemy course in Azure Synapse. I created a link service that connects to my 'master' database of my serverless SQL pool. The connection itself was successful:

enter image description here

I then created a simple 'script activity' in the Pipeline which connects to this linked service and then returns a query.

enter image description here

Here is the full query I used (I masked some of the data):


SELECT
    TOP 100 *, result.filename()
FROM
    OPENROWSET(
        BULK 'https://xxxx.dfs.core.windows.net/taxi/raw/trip_data_green_parquet/year=*/month=*/*.parquet',
        FORMAT = 'PARQUET'
    )  
    WITH (
        VendorID INT, 
        lpep_pickup_datetime datetime2(7)
        )   AS [result]

It should be noted that the above query works fine when I run it outside of the pipeline. I then clicked 'validate' to check for errors and no errors were found:

enter image description here

But when I clicked 'Debug' it failed:

enter image description here

Here is the full error code:

{
    "errorCode": "2011",
    "message": "Argument {0} is null or empty.\r\nParameter name: paraKey",
    "failureType": "UserError",
    "target": "Script1",
    "details": []
}

Any help would be greatly appreciated.

I tried running the following SQL query and it worked:

SELECT
    TOP 10 *
FROM
    OPENROWSET(
        BULK 'https://xxxx.dfs.core.windows.net/taxi/raw/taxi_zone_without_header.csv',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0'
    ) 
   WITH(
        LocationID SMALLINT 1,
        Borough VARCHAR(15) COLLATE Latin1_General_100_CI_AI_SC_UTF8 2,
        service_zone VARCHAR(15) COLLATE Latin1_General_100_CI_AI_SC_UTF8 4,
        Zone VARCHAR(50) COLLATE Latin1_General_100_CI_AI_SC_UTF8 3
    ) AS [result]

I'm not too sure why the other one did not work.

CodePudding user response:

"Argument {0} is null or empty.\r\nParameter name: paraKey"

This error is because there is no column name for results.filename() in the SQL script. Try giving result.filename() as file_name in the script.

I repro'd this without alias name for filename() function and got the same error.

enter image description here

  • Then column name is added in script and synapse pipeline is run successfully.

enter image description here

SQL script:

SELECT
    TOP 100 *, result.filename() as file_name
FROM
    OPENROWSET(
        BULK 'https://xxxx.dfs.core.windows.net/taxi/raw/trip_data_green_parquet/year=*/month=*/*.parquet',
        FORMAT = 'PARQUET'
    )  
    WITH (
        VendorID INT, 
        lpep_pickup_datetime datetime2(7)
        )   AS [result]
  • Related