Home > Software engineering >  Finding table names in SSIS .dtsx packages
Finding table names in SSIS .dtsx packages

Time:11-01

I am trying to scan SSIS .dtsx packages for table names. Yes, I know that I should use [xml] and a tool that parses SQL language. That does not seem to be possibe at this time. PowerShell can understand [xml], but SQL parsers generally cost and using ANTLR is more of an investment than is acceptable at this time. I am open to suggestions, but I am not asking for a tool recommendation.

There are two (2) problems.

1) `&.;` does not appear to be recognized as separate from the table name capture item
2) TABLE5 does not appear to be found

Yes, I also know that schema names should not be hardcoded into source. It makes it difficult/impossible for DBAs to manage the database. That is the way it is done here.

How can I make the regex omit the &.*; from the capture and recognize dbo.TABLE5

Here is the code I am using to scan the .dtsx files.

PS C:\src\sql> Get-Content .\Find-FromJoinSql.ps1
Get-ChildItem -File -Filter '*.dtsx' |
    ForEach-Object {
        $Filename = $_.Name
        Select-String -Pattern '(FROM|JOIN)(\s|&.*;) (\S )(\s|&.*;) ' -Path $_ -AllMatches |
        ForEach-Object {
            if ($_.Matches.Groups.captures[3].value -match 'dbo') {
                "$Filename === $($_.Matches.Groups.captures[3].value)"
            }
        }
    }

Here is a tiny sample of the type of text from the .dtsx file.

PS C:\src\sql> Get-Content .\sls_test.dtsx
USE ADATABASE;
SELECT * FROM dbo.TABLE1 WHERE F1 = 3;
SELECT * FROM dbo.TABLE2 T2
    FULL OUTER JOIN dbo.TABLEJ TJ
        ON T2.KEY = TJ.KEY;
SELECT * FROM dbo.TABLE3 T3
    INNER JOIN ADATABASE2.dbo.TABLEK
TK ON
T3.user_id = TK.user_id

SELECT * FROM dbo.TABLE4 T4 FULL OUTER JOIN dbo.TABLE5 T5
    ON T4.F1 = T5.F1;
EXIT

Running the script on this data produces:

PS C:\src\sql> .\Find-FromJoinSql.ps1
sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
TK
sls_test.dtsx === dbo.TABLE4

PS C:\src\sql> $PSVersionTable.PSVersion.ToString()
7.1.5

CodePudding user response:

Indeed strange that some entities (
) are not replaced in those files.

Change the regex pattern a bit to capture the dbo.table names like below.

Using Get-Content

$regex = [regex] '(?im)(?:FROM|JOIN)(?:\s|&[^;] ;) ([^\s&] )(?:\s|&[^;] ;)*'
Get-ChildItem -Path D:\Test -File -Filter '*.dtsx' |
    ForEach-Object {
        $match = $regex.Match((Get-Content -Path $_.FullName -Raw))
        while ($match.Success) {
            "$($_.Name) === $($match.Groups[1].Value)"
            $match = $match.NextMatch()
        } 
    }

Using Select-String

As to why Select-String -AllMatches skipped your Table5.
From the docs: "When Select-String finds more than one match in a line of text, it still emits only one MatchInfo object for the line, but the Matches property of the object contains all the matches."

That means you need another loop to get all the $Matches from each $MatchInfo objects to get them in your output:

$pattern = '(?:FROM|JOIN)(?:\s|&[^;] ;) ([^\s&] )(?:\s|&[^;] ;)*'
Get-ChildItem -Path 'D:\Test' -File -Filter '*.dtsx' |
    ForEach-Object {
        $Filename = $_.Name
        Select-String -Pattern $pattern -Path $_.FullName -AllMatches |
        ForEach-Object {
            # loop again, because each $MatchInfo object may contain multiple
            # $Matches objects if more matches were found in the same line
            foreach ($match in $_.Matches) {
                if ($match.Groups[1].value -match 'dbo') {
                    "$Filename === $($match.Groups[1].value)"
                }
            }
        }
    }

Output:

sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
sls_test.dtsx === dbo.TABLE4
sls_test.dtsx === dbo.TABLE5

Regex details:

(?im)              Use case-insensitive matching and have '^' and '$' match at linebreaks
(?:                Match the regular expression below
                   Match either the regular expression below (attempting the next alternative only if this one fails)
      FROM         Match the characters “FROM” literally
   |               Or match regular expression number 2 below (the entire group fails if this one fails to match)
      JOIN         Match the characters “JOIN” literally
)                 
(?:                Match the regular expression below
   |               Match either the regular expression below (attempting the next alternative only if this one fails)
      \s           Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
   |               Or match regular expression number 2 below (the entire group fails if this one fails to match)
      &            Match the character&” literally
      [^;]         Match any character that is NOT a “;”
                   Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      ;            Match the character “;” literally
)                  Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(                  Match the regular expression below and capture its match into backreference number 1
   [^\s&]          Match a single character NOT present in the list below
                   A whitespace character (spaces, tabs, line breaks, etc.)
                   The character&             Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)                 
(?:                Match the regular expression below
   |               Match either the regular expression below (attempting the next alternative only if this one fails)
      \s           Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
   |               Or match regular expression number 2 below (the entire group fails if this one fails to match)
      &            Match the character&” literally
      [^;]         Match any character that is NOT a “;”
                   Between one and unlimited times, as many times as possible, giving back as needed (greedy)
      ;            Match the character “;” literally
)*                 Between zero and unlimited times, as many times as possible, giving back as needed (greedy)

  • Related