I am using SSIS to transform a raw data row into a transaction. Everything was going well until I added logic for a new field called "SplitPercentage" to the SQL command. The new field simply converts the value to a decimal, for example 02887 would transform into 0.2887.
The new logic works as intended, but now it takes 8 hours to run instead of 5 minutes.
Please see entire original code vs new code here:
https://www.diffchecker.com/1qsXsj9Q/
Greatly appreciate any help!
New logic resulting in poor performance:
IF TRIM(SUBSTRING(@line, 293, 1)) = 1
BEGIN
SET @SplitPercentage = 1
END
ELSE
BEGIN
SET @SplitPercentage = CAST(''.'' TRIM(SUBSTRING(@line, 294, 4)) AS decimal(7, 4))
END
CodePudding user response:
While your current code is not ideal, I don't see anything in your new expression (SUBSTRING()
, TRIM()
, concatenation, CAST
) that would account for such a drastic performance hit. I suspect the cause lies elsewhere.
However, I believe your expression can be simplified to eliminate the IF
. Given a 5-character field "nnnnn" that you wish to treat as a decimal n.nnnn, you should be able to do this in a single statement using STUFF()
to inject the decimal point:
@SplitPercentage = CAST(STUFF(SUBSTRING(@line, 293, 5), 2, 0, '.') AS decimal(7, 4))
The STUFF()
injects the decimal point at position 2 (replacing 0 characters). I see no need for the TRIM()
.
(You would to double up the quotes for use within your Exec ('...')
statement.)
CodePudding user response:
Please try to change IF/ELSE block of code as follows:
SET @SplitPercentage = IIF(TRIM(SUBSTRING(@line, 293, 1)) = ''1''
, 1.0000
, CAST(''.'' TRIM(SUBSTRING(@line, 294, 4)) AS DECIMAL(7, 4)));
CodePudding user response:
A challenge you've run into is "I have a huge dynamic query process that I cannot debug." When I run into these issues, I try to break the problem down into smaller, solvable, set based options.
Reading that wall of code, my psuedocode would be something like
- For all the data in Inbound_Transaction_Source by a given Source value (@SourceName)
- Do all this data validation, type correction and cleanup by slicing out the current line into pieces
You can then lose the row-based approach by slicing your data up. I favor using CROSS APPLY at this point in my life but a CTE, Dervied Table, whatever makes sense in your head is valid.
Why I favor this approach though, is you can see what you're building, test it, and then modify it without worrying you're going to upset a house of cards.
-- Column ordinal declaration and definition is offsite
SELECT
*
FROM
[dbo].[Inbound_Transaction_Source] AS ITS
CROSS APPLY
(
SELECT
CurrentAgentNo = SUBSTRING(ITS.line, @CurrentAgentStartColumn, 10)
, CurrentCompMemo = SUBSTRING(ITS.line, @CompMemoStartColumn @Multiplier, 1)
, CurrentCommAmount = SUBSTRING(ITS.line, @CommAmountStartColumn @Multiplier, 9)
, CurrentAnnCommAmount = SUBSTRING(ITS.line, @AnnCommAmountStartColumn @Multiplier, 9)
, CurrentRetainedCommAmount = SUBSTRING(ITS.line, @RetainedCommAmountStartColumn @Multiplier, 9)
, CurrentRetainedSwitch = SUBSTRING(ITS.line, @RetainedSwitchStartColumn @Multiplier, 9)
-- etc
-- A sample of your business logic
, TransactionSourceSystemCode = SUBSTRING(ITS.line, 308, 3)
)NamedCols
CROSS APPLY
(
SELECT
-- There's some business rules to be had here for first year processing
-- Something special with position 102
SUBSTRING(ITS.line,102 , 1) AS SeniorityBit
-- If department code? is 0079, we have special rules
, TRIM(SUBSTRING(ITS.line,141, 4)) As DepartmentCode
)BR0
CROSS APPLY
(
SELECT
CASE
WHEN NamedCols.TransactionSourceSystemCode in ('LVV','UIV','LMV') THEN
CASE WHEN BR0.SenorityBit = '0' THEN '1' ELSE '0' END
WHEN NamedCols.TransactionSourceSystemCode in ('CMP','FAL') AND BR0.DepartmentCode ='0079' THEN
CASE WHEN BR0.SenorityBit = '1' THEN '0' ELSE '1' END
WHEN NamedCols.TransactionSourceSystemCode in ('UIA','LMA','RIA') AND BR0.SenorityBit > '1' THEN
'1'
WHEN NamedCols.TransactionSourceSystemCode in ('FAL') THEN
'1'
ELSE '0'
END
)FY(IsFirstYear)
WHERE Source = @SourceName
ORDER BY Id;
Why did processing take increase from 5 minutes to 8 hours?
It likely had nothing to do with the change to the dynamic SQL. When an SSIS package run is "taking forever" relative to normal, then preferably while it's still running, look at your sources and destinations and make note of what it happening as it's likely one of the two.
A cursor complicates your life and is not needed once you start thinking in sets but it's unlikely to be the source of the performance problems given than you have a solid baseline of what normal is. Plus, this query is a single table query with a single filter.
Your SSIS package's data flow is probably chip shot Source to Destination Extract and Load or Slurp and Burp with no intervening transformation (as the logic is all in the stored procedure). If that's the case, then the only two possible performance points of contention are the source and destination. Since the source appears trivial, then it's likely that some other process had the destination tied up for those 8 hours. Had you run something like sp_whoisactive on the source and destination, you can identify the process that is blocking your run.