Home > Software engineering >  How can split a row of data and insert each row (but different columns) into two tables (with a FK r
How can split a row of data and insert each row (but different columns) into two tables (with a FK r

Time:11-04

I have two tables in SQL Server:

Person

ID (PK, int, IDENTITY)
Name (varchar(100))
UploadedBy (varchar(50))
DateAdded (datetime)

PersonFile

ID (PK, int, IDENTITY)
PersonId (FK, int)
PersonFile (varchar(max))

I am reading in a large file (150MB), and I have a script component that can successfully parse the file into several columns. The issue is that I need to insert the first 3 columns of my parsed data row into my Person table first, then use the ID of that Row to insert the final column into my PersonFile table. Is there an easy way to do this in SSIS?

I suppose I could technically script everything out to handle inserts in the database, but I feel like in that case, I might as well just skip SSIS altogether and user powershell. I also thought about writing a procedure in SQL server and then passing the information to the procedure to handle inserts. But again, this seems very inefficient.

What's the best way for me to insert a row of data into two tables, if one of them has a foreign key constraint?

CodePudding user response:

I think the best way is to use a stage table in the database to hold the parsed source file and then use stored procedures or SQL-query to load your tables. There is a lookup component in SSIS that can be used for your case but I try avoiding it for various reasons.

Create a table resembeling the source file, something like:

CREATE TABLE dbo.[SourceFileName](
  Name nvarchar(100) NULL,
  UploadedBy nvarchar(50) NULL,
  DateAdded datetime NULL,
  PersonFile nvarchar(max) NULL
)

Truncate the stage table. Use a dataflow component to get the source data. Use script or stored procedures to insert the source data in your destination table (begin with Person and the load PersonFile). Your SSIS dataflow should look something like this:

enter image description here

For the insert script for person do something like:

INSERT INTO dbo.Person (Name, UploadedBy,DateAdded)
SELECT Name,UploadedBy,DateAdded
FROM dbo.SourceFileName;

For the insert for PersonFile make a join to the destination table:

INSERT INTO dbo.PersonFile(PersonId,PersonFile)
SELECT
  Person.ID,
  SourceFile.PersonFile
FROM dbo.SourceFileName SourceFile
JOIN dbo.Person Person
ON Person.Name = SourceFile.Name

You should also add a UNIQUE CONSTRAINT to the column that identifies the person (Name for example).

CodePudding user response:

One very common thing to do would be to stage the data first.

So you insert all columns into a table on the server, which also has an extra nullable column for the PersonID.

Then you’d have a stored procedure which inserts unique Person records into the Person table, and updates the staging table with the resulting PersonID, which is the extra field you need for the PersonFile insert, which could then be performed either in the same procedure or another one. (You’d call these procedures in SSIS with an Execute SQL Task.)

I suppose this could possibly be done purely in SSIS, for example with a Script Destination that performs an insert and retrieves the PersonID for a second insert, but I’m fairly sure performance would take a huge hit with an approach like that.

  • Related