Home > OS >  AzureSynapse pipeline how to add guid to raw data
AzureSynapse pipeline how to add guid to raw data

Time:09-09

I am new to AzureSynapse and am technically a Data Scientist whos doing a Data Engineering task. Please help!

I have some xlsx files containing raw data that I need to import into an SQL database table. The issue is that the raw data does not have a uniqueidentifer column and I need to add one before inserting the data into my SQL database.

I have been able to successfully add all the rows to the table by adding a new column on the Copy Data command and setting it to be @guid(). However, this sets the guid of every row to the same value (not unique for each row).

GUID mapping: @guid() mapping

DB Result: DB results

If I do not add this mapping, the pipeline throws an error stating that it cannot import a NULL Id into the column Id. Which makes sense as this column does not accept NULL values.

Is there a way to have AzureSynapse analystics read in a raw xlsx file and then import it into my DB with a unique identifier for each row? If so, how can I accomplish this?

Many many thanks for any support.

CodePudding user response:

  • Giving dynamic content to a column in this way would generate the same value for entire column.
  • Instead, you can generate a new guid for each row using a for each activity.
  • You can retrieve the data from your source excel file using a lookup activity (my source only has name column). Give the output array of lookup activity to for each activity.
@activity('Lookup1').output.value

enter image description here

  • Inside for each, since you already have a linked service, create a script activity. In this script activity, you can create a query with dynamic content to insert values into the destination table. The following is the query I built using dynamic content.
insert into demo values ('@{guid()}','@{item().name}')

enter image description here

  • This allows you to iterate through source rows, insert each row individually while generating new guid every time

You can follow the above procedure to build a query to insert each row with unique identifier value. The following is an image where I used copy data to insert first 2 rows (same as yours) and inserted the next 2 rows using the above procedure.

enter image description here

NOTE: I have taken Azure SQL database for demo, but that does not affect the procedure.

  • Related