I am trying to scan SSIS .dtsx packages for table names. Yes, I know that I should use [xml] and a tool that parses SQL language. That does not seem to be possibe at this time. PowerShell can understand [xml], but SQL parsers generally cost and using ANTLR is more of an investment than is acceptable at this time. I am open to suggestions, but I am not asking for a tool recommendation.
There are two (2) problems.
1) `&.;` does not appear to be recognized as separate from the table name capture item
2) TABLE5 does not appear to be found
Yes, I also know that schema names should not be hardcoded into source. It makes it difficult/impossible for DBAs to manage the database. That is the way it is done here.
How can I make the regex omit the &.*;
from the capture and recognize dbo.TABLE5
Here is the code I am using to scan the .dtsx files.
PS C:\src\sql> Get-Content .\Find-FromJoinSql.ps1
Get-ChildItem -File -Filter '*.dtsx' |
ForEach-Object {
$Filename = $_.Name
Select-String -Pattern '(FROM|JOIN)(\s|&.*;) (\S )(\s|&.*;) ' -Path $_ -AllMatches |
ForEach-Object {
if ($_.Matches.Groups.captures[3].value -match 'dbo') {
"$Filename === $($_.Matches.Groups.captures[3].value)"
}
}
}
Here is a tiny sample of the type of text from the .dtsx file.
PS C:\src\sql> Get-Content .\sls_test.dtsx
USE ADATABASE;
SELECT * FROM dbo.TABLE1 WHERE F1 = 3;
SELECT * FROM dbo.TABLE2 T2
FULL OUTER JOIN dbo.TABLEJ TJ
ON T2.KEY = TJ.KEY;
SELECT * FROM dbo.TABLE3 T3
INNER JOIN ADATABASE2.dbo.TABLEK
TK ON
T3.user_id = TK.user_id

SELECT * FROM dbo.TABLE4 T4 FULL OUTER JOIN dbo.TABLE5 T5
ON T4.F1 = T5.F1;
EXIT
Running the script on this data produces:
PS C:\src\sql> .\Find-FromJoinSql.ps1
sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
TK
sls_test.dtsx === dbo.TABLE4
PS C:\src\sql> $PSVersionTable.PSVersion.ToString()
7.1.5
CodePudding user response:
Indeed strange that some entities (

) are not replaced in those files.
Change the regex pattern a bit to capture the dbo.table names like below.
Using Get-Content
$regex = [regex] '(?im)(?:FROM|JOIN)(?:\s|&[^;] ;) ([^\s&] )(?:\s|&[^;] ;)*'
Get-ChildItem -Path D:\Test -File -Filter '*.dtsx' |
ForEach-Object {
$match = $regex.Match((Get-Content -Path $_.FullName -Raw))
while ($match.Success) {
"$($_.Name) === $($match.Groups[1].Value)"
$match = $match.NextMatch()
}
}
Using Select-String
As to why Select-String -AllMatches
skipped your Table5.
From the docs: "When Select-String finds more than one match in a line of text, it still emits only one MatchInfo object for the line, but the Matches property of the object contains all the matches."
That means you need another loop to get all the $Matches from each $MatchInfo objects to get them in your output:
$pattern = '(?:FROM|JOIN)(?:\s|&[^;] ;) ([^\s&] )(?:\s|&[^;] ;)*'
Get-ChildItem -Path 'D:\Test' -File -Filter '*.dtsx' |
ForEach-Object {
$Filename = $_.Name
Select-String -Pattern $pattern -Path $_.FullName -AllMatches |
ForEach-Object {
# loop again, because each $MatchInfo object may contain multiple
# $Matches objects if more matches were found in the same line
foreach ($match in $_.Matches) {
if ($match.Groups[1].value -match 'dbo') {
"$Filename === $($match.Groups[1].value)"
}
}
}
}
Output:
sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
sls_test.dtsx === dbo.TABLE4
sls_test.dtsx === dbo.TABLE5
Regex details:
(?im) Use case-insensitive matching and have '^' and '$' match at linebreaks
(?: Match the regular expression below
Match either the regular expression below (attempting the next alternative only if this one fails)
FROM Match the characters “FROM” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
JOIN Match the characters “JOIN” literally
)
(?: Match the regular expression below
| Match either the regular expression below (attempting the next alternative only if this one fails)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
& Match the character “&” literally
[^;] Match any character that is NOT a “;”
Between one and unlimited times, as many times as possible, giving back as needed (greedy)
; Match the character “;” literally
) Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 1
[^\s&] Match a single character NOT present in the list below
A whitespace character (spaces, tabs, line breaks, etc.)
The character “&”
Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?: Match the regular expression below
| Match either the regular expression below (attempting the next alternative only if this one fails)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
& Match the character “&” literally
[^;] Match any character that is NOT a “;”
Between one and unlimited times, as many times as possible, giving back as needed (greedy)
; Match the character “;” literally
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)