Currently I am trying to convert vertical data structure to horizontal data dynamically by implementing a cursor in the stored procedure. However, the execution time takes too long. I am trying to find a way to optimise this approach. What is the better way to use other than Cursor? Note: PIVOT cant be used because of scattered data mapping. This is my current code with sample data and expected output:
What I have done so far: There are three temporary tables #tablestructure, #raw and #dsid. #Tablestruture contains all the columns under destfieldname which are needed to create a table. Then once it is created, we need to take the contents from #raw and insert it into the new table which has been created based on the row and column number
#tablestructure
desttablename | destfieldname | datatype |
---|---|---|
* | subjectnumber | int |
sample | personame | nvarchar(20) |
sample | personlocation | nvarchar(20) |
#raw table
rownumber | columnnumber | fieldname | contents |
---|---|---|---|
1 | 1 | subjectnumber | 132516352 |
1 | 2 | personname | Alex |
2 | 1 | subjectnumber | 132516353 |
2 | 3 | personlocation | Canada |
1 | 3 | personlocation | Australia |
2 | 2 | personname | John |
#DSID Table
projectid | respondentfieldname | debug | fileincludedheaders |
---|---|---|---|
123 | subjectnumber | NULL | 1 |
Expected output
Subjectnumber | PersonName | Personlocation |
---|---|---|
132516352 | Alex | Australia |
132516353 | John | Canada |
declare createSQL cursor for
select distinct 'drop table TestTmp.dbo.' t.destname as droptableSQL,
'CREATE TABLE TestTmp.dbo.' t.destname ' (' stuff((
select ',[' destfieldname '] ' f.datatype from #tableStructure f
where (f.desttablename in (t.desttablename, '*'))
group by destfieldname, f.datatype)
for xml path ('')),1,1,'') ', [sequenceID] int, [subsequenceID] int)' as createTableSQL,
'insert into VectorNormalizerTmp.dbo.' t.desttablename @tablesuffix '([' replace(replace(respondentidfieldname,']',''),'[','') '],sequenceid, subsequenceid)
SELECT a.contents, b.sequenceid, b.subsequenceid from (select contents from #raw where fieldname = ''' respondentidfieldname ''') a
CROSS JOIN (select sequenceid, subsequenceid from #rowUniques where desttablename = ''' desttablename ''') b ' as appendEmptyRowsSQL,
'create index idx' t.desttablename @tablesuffix ' on VectorNormalizerTmp.dbo.' desttablename @tablesuffix '([' respondentidfieldname '],sequenceid, subsequenceid)' as idxSQL
from (select distinct desttablename from #tablestructure where desttablename = 'Screener') t
cross join (select respondentidfieldname, debug from #dsid) d
declare @sqltorun nvarchar(max);
declare @sqltorun2 nvarchar(max);
declare @sqltorun3 nvarchar(max);
declare @sqltorun4 nvarchar(max);
open createSQL
fetch next from createSQL into @sqltorun, @sqltorun2, @sqltorun3, @sqltorun4
while @@FETCH_STATUS = 0
begin
begin try
--print @sqltorun
EXECUTE sp_executesql @sqltorun
end try
begin catch
--print '!!ERROR: ' ERROR_MESSAGE()
end catch
print @sqltorun2
EXECUTE sp_executesql @sqltorun2
print @sqltorun3
EXECUTE sp_executesql @sqltorun3
--print @sqltorun4
begin try
EXECUTE sp_executesql @sqltorun4
end try
begin catch
--print '!!ERROR: ' ERROR_MESSAGE()
end catch
fetch next from createSQL into @sqltorun, @sqltorun2, @sqltorun3, @sqltorun4
end
close createSQL
deallocate createSQL
CodePudding user response:
I dont quite understand the general idea of this cursor, but you can't create and fill multiple talbes without some sort of circle, so cursor is ok. But you could accelerate it by optimizing your appendEmptyRowsSQL query. It seems like #raw table is huge, so it will be better to have some sort of fieldname dimension, which ID you could assign to #raw table and then creating clustered index on it. Also if #raw table is more than 2 millions rows on one fieldname, you may be need to insert data in a circle with batchsize about 1 million.
CodePudding user response:
Honestly, this question is unclear, I've guessed what you are after, as it was never truly explained. This creates your 2 tables, *
and sample
(on a hard coded schema dbo
) and INSERT
s the data from table raw
into the respective tables. It also assumes you are on a fully supported version of SQL Server.
This is, however, a complete mess:
SELECT *
INTO dbo.tablestructure
FROM (VALUES(N'*',N'subjectnumber',N'int'),
(N'sample',N'personname',N'nvarchar(20)'),
(N'sample',N'personlocation',N'nvarchar(20)'))V(desttablename,destfieldname,datatype);
SELECT *
INTO dbo.raw
FROM (VALUES(1,1,N'subjectnumber',N'132516352'),
(1,2,N'personname',N'Alex'),
(2,1,N'subjectnumber',N'132516353'),
(2,3,N'personlocation',N'Canada'),
(1,3,N'personlocation',N'Australia'),
(2,2,N'personname',N'John'))V(rownumber,columnnumber,fieldname,contents);
GO
DECLARE @SQL nvarchar(MAX),
@CRLF nchar(2) = NCHAR(13) NCHAR(10);
WITH Tables AS(
SELECT ts.desttablename,
N'CREATE TABLE dbo.' QUOTENAME(ts.desttablename) N' (' STRING_AGG(QUOTENAME(ts.destfieldname) N' ' QUOTENAME(V.datatype) V.precisionscale,',') WITHIN GROUP (ORDER BY ts.destfieldname) N');' AS CreateStatement
FROM dbo.tablestructure ts
--I don't trust data in a table like this, so I'm going to check it's "good"
--This won't cause an error it's it's bad, but it'll always avoid injection
--If you want an error, you would need to checl the validity first
CROSS APPLY (VALUES(LEFT(ts.datatype,CHARINDEX('(',datatype '(')-1),STUFF(ts.datatype,1,CHARINDEX('(',datatype '(')-1,'')))V(datatype,precisionscale)
JOIN sys.types t ON V.datatype = t.name --Make sure that the datatype is valid
WHERE V.precisionscale NOT LIKE '%[^(),0-9 ]%' --Only contains parenthesis, numbers, commas and spaces
OR V.precisionscale = '(MAX)' --Or the scale is MAX
GROUP BY ts.desttablename)
SELECT @SQL = STRING_AGG(CreateStatement,@CRLF)
FROM Tables T;
EXEC sys.sp_executesql @SQL;
GO
DECLARE @SQL nvarchar(MAX),
@CRLF nchar(2) = NCHAR(13) NCHAR(10);
WITH InsertClause AS (
SELECT ts.desttablename,
N'INSERT INTO dbo.' QUOTENAME(desttablename) N' (' STRING_AGG(QUOTENAME(destfieldname) N'',',') WITHIN GROUP (ORDER BY destfieldname) N')' AS InsertClause
FROM dbo.tablestructure ts
WHERE EXISTS (SELECT 1
FROM dbo.raw r
WHERE ts.destfieldname = r.fieldname)
GROUP BY ts.desttablename),
RowValues AS(
SELECT IC.InsertClause,
N'(' STRING_AGG(N'N''' REPLACE(r.contents,'''','''''') N'''',',') WITHIN GROUP (ORDER BY r.fieldname) N')' AS RowInsert
FROM InsertClause IC
JOIN dbo.tablestructure ts ON IC.desttablename = ts.desttablename
JOIN dbo.raw r ON ts.destfieldname = r.fieldname
GROUP BY IC.InsertClause,
r.rownumber),
InsertStatements AS(
SELECT InsertClause @CRLF
N'VALUES ' STRING_AGG(RowInsert,N',') N';' AS FullInsertStatement
FROM RowValues
GROUP BY InsertClause)
SELECT @SQL = STRING_AGG(FullInsertStatement,@CRLF)
FROM InsertStatements;
PRINT @SQL;
EXEC sys.sp_executesql @SQL;
GO
SELECT *
FROM dbo.[*];
SELECT *
FROm dbo.[sample];
GO
DROP TABLE dbo.tablestructure;
DROP TABLe dbo.raw;
GO
DROP TABLE IF EXISTS dbo.[*]
DROP TABLE IF EXISTS dbo.sample;