I am new to the Azure data factory and looking to copy CSV data into my table having a foreign key relationship.
Here are my tables:
Customer table
CREATE TABLE [dbo].[Customer]
(
[Id] UNIQUEIDENTIFIER NOT NULL PRIMARY KEY, -- Primary Key column
[CustomerNumber] NVARCHAR(50) NOT NULL,
[FirstName] NVARCHAR(50) NOT NULL,
[LastName] NVARCHAR(50) NOT NULL,
[CreatedOn] datetime,
[CreatedBy] NVARCHAR(255),
[ModifiedOn] datetime,
[ModifiedBy] NVARCHAR(255)
);
GO
-- Insert rows into table 'Customer' in schema '[dbo]'
INSERT INTO [dbo].[Customer]
VALUES
(
NEWID(),'Tom123', 'Tom', 'Shehu',GETDATE(),'test',GETDATE(),'admin'
),
(
NEWID(),'Harol234', 'Harold', 'Haoxa',GETDATE(),'test',GETDATE(),'admin'
),
(
NEWID(),'Peter345', 'Peter', 'Begu',GETDATE(),'test',GETDATE(),'admin'
),
(
NEWID(),'Marlin09', 'Marlin', 'Hysi',GETDATE(),'test',GETDATE(),'admin'
)
GO
Product Table
CREATE TABLE [dbo].[Product]
(
[Id] UNIQUEIDENTIFIER NOT NULL PRIMARY KEY, -- Primary Key column
[Name] NVARCHAR(50) NOT NULL,
[ErpNumber] NVARCHAR(50) NOT NULL,
[Description] NVARCHAR(50) NOT NULL,
[CreatedOn] datetime,
[CreatedBy] NVARCHAR(255),
[ModifiedOn] datetime,
[ModifiedBy] NVARCHAR(255)
);
GO
-- Insert rows into table 'Product' in schema '[dbo]'
INSERT INTO [dbo].[Product]
VALUES
(
NEWID(), 'EI500CMZ', 'EI500CMZ','7-Day test product',GETDATE(),'Tom',GETDATE(),'Tom'
),
(
NEWID(), 'ST0SMX', 'ST0SMX','7-Day heavy duty product',GETDATE(),'Tom',GETDATE(),'Tom'
),
(
NEWID(), 'EH30MZ', 'EH30MZ','Electronic water test product',GETDATE(),'Tom',GETDATE(),'Tom'
)
CustomerProduct table
CREATE TABLE [dbo].[CustomerProduct]
(
[Id] UNIQUEIDENTIFIER NOT NULL PRIMARY KEY, -- Primary Key column
[CustomerId] UNIQUEIDENTIFIER NOT NULL,
[ProductId] UNIQUEIDENTIFIER NOT NULL,
[Name] NVARCHAR(255) NOT NULL,
[CreatedOn] datetime,
FOREIGN KEY(CustomerId) REFERENCES Customer(Id),
FOREIGN KEY(ProductId) REFERENCES Product(Id)
);
GO
Below is my CSV file data:
CustomerNumber,ErpNumber,Name
Tom123,EI500CMZ,EI500CMZ2340
Harol234,ST0SMX,ST0SMX74770
Peter345,EH30MZ,EH30MZ00234
Now I am looking to insert data into my 3rd table i.e CustomerProduct but I am not understanding how that "CustomerId", "ProductId" and "Name" will get inserted. In the above CSV data, I am getting the "CustomerNumber" and "ErpNumber" but during the insertion "CustomerId" and "ProductId" should go in the table.
Not understanding how to do this.
So far I have done this in the Azure data factory:
Created a blob storage account. Added a container in blob storage and uploaded my CSV file.
Created a linked service of type Azure blob storage called "CustomerProductInputService" that will talk to blob storage
Created a linked service of type Azure SQL database called "CustomerProductOutputService" that will communicate with the "CustomerProduct" table.
Created a dataset of type azure blob. This will receive the data from "CustomerProductInputService".
Created a dataset of type azure SQL database.
Now I am stuck at copy activity. I am not understanding how to create pipeline for this scenario and insert data into the CustomerProduct table.
As I explained I am getting "CustomerNumber" and "ErpNumber" in the CSV file but I want to insert "CustomerId" and "ProductId" into my "CustomerProduct" table.
Can anybody help me?
CodePudding user response:
You can Insert the CustomerProduct data from CSV to the table using dataflow activity using lookup transformations to get the CustomerID and ProductID from Customer and Product table respectively.
Source:
- Add 3 source transformations in dataflow, 1 for CSV source file, 1 for Customer table, and 1 for Product table.
a) Source1 (CSV): Create and CSV dataset to source1 to get Input file data.
b) Source2 (CustomerTable): Connect to Customer table and get all the existing data from the Customer table.
• As we only need ID and CustomerNumber columns from the Customer table, add select
transformation (Customer) after source2 to select only the required column list.
c) Source3 (ProductTable): Connect Source3 to Product Table to pull all the existing data from dbo.Product.
• Add Select
transformation (Product) after Source3 to get only the required columns ID & ERPNumber from the column list.
- Add
Lookup
transformation toSource1
(CSV) with Primary stream as CSV source and Lookup Stream as Customer (Source2 Select transformation) and Lookup Condition as CSV column “CustomerNumber” equals to (==) Customer table Column “CustomerNumber”.
- As
Lookup
is like Left join here, it includes all columns from Source1 and Lookup columns from Source2 in the select list (which include duplicate columns).
a) So, using select
transformation (CustomerSelectList) to select only the required columns in the Output. Also renaming the Column name “ID” which is pulled from Customer table to CustomerID to match with Sink table.
- Add another
Lookup
transformation after Select (CustomerSelectList) to get the data from Product table.
a) Select Primary stream as CustomerSelectList (Select transformation) and Lookup stream as Product (Select of Source3)
b) With Lookup condition as CSV Source column “ErpNumber” equals to (==) Product table column “ErpNumber”.
- Again, using the
select
transformation to ignore other columns and select only required columns from the select list. Also renaming column “ID” from Product table to ProductID.
- Add
Derived Column
transformation to the select (CustomerProductSelectList) to add new columns ID and CreatedOn.
a) ID: as this is UNIQUEIDENTIFIER in the sink table, we can add an expression to generate the id using UUID()
in the derived column.
b) CreatedOn: adding expression to get the **Current timestamp**
to Sink table.
- Finally add
Sink
transformation to Insert data onto CustomerProduct table.
- Add this dataflow to a pipeline and run the pipeline to insert data.
Output:
CodePudding user response:
First thing you would need to identify a key connection between customer and product. Next, create a pipeline from data factory and create 2 sources "Product" and "Customer" apply ADF transformation Join and Alter and sink it to dbo.CustomerProduct.