I'm splitting my SQL file into batches based on the String GO. I took stackoverflow thread How do I split file on batches using powershell as reference And noticed this regex doesn't work for few scenarios
- If the go is found in single quotes the string gets split. (I want to avoid the split for any text that are inside single quotes)
- And another scenario is when I use go on Declare statement like Declare @go
I'm not familiar with regex patterns. So, I tried to search some online documentation on regex and came up with the pattern to find anything within single quotes and how to ignore the @go in file when splitting. Below are the regex
- ('([\s\S]*?)') - gets me the string that is with in single quotes, but I'm not sure how to add this match to exclude to current regex pattern
- (?<![\S])(?:\bGO\b) - Can avoid splitting when any non Whitespace char is before the GO string that took care of @go in below sql file
SQL file content:
select * from testTbl; GO
select * from testTbl2;
GO
Declare @go varchar(15) = 'IGo test'
select @go
GO
SELECT 'go', ' go ', 'asdv Igo asdsad',
'
go
'
GO
create table #Temp
(
IdGo int,
GoId Varchar(50)
)
select * from #Temp
drop table #Temp
GO
PowerShell script line:
$batches = ( $scriptData -split "(?:\bGO\b)" ) | % { $_ "`r`nGO" }
Note: var $scriptData
has the SQL file content in it.
Is this a correct approach or how can we exclude the split when the string is with in single quotes? Is there any cleaner approach to do this?
FYI: I'll update the other thread answer, Once I can figure out the solution to this. Or I'm happy to update the other thread and delete this if any one feels it is a duplicate.
Updated: Desired output:
select * from testTbl;
GO
select * from testTbl2;
GO
Declare @go varchar(15) = 'IGo test' select @go
GO
SELECT 'go', ' go ', 'asdv Igo asdsad','go'
GO
create table #Temp ( IdGo int, GoId Varchar(50) ) select * from #Temp drop table #Temp
GO
CodePudding user response:
To robustly parse your input into batches, you need a language parser that can reliably identify syntactic elements - regexes are not sophisticated enough to model the grammar of your input.
In the absence of a T-SQL parser,[1] you can probably get away with PowerShell's own language parser, [System.Management.Automation.Language.Parser]
, given that there are high-level commonalities between the two languages, so it should be able to identify isolated, non-@
-prefixed GO
tokens in your input:
Caveat re commment support:
Because T-SQL's comment constructs differ from PowerShell, using the PowerShell parser would yield false positives for (isolated)
GO
substrings inside comments.The solution below therefore uses regex-based preprocessing that removes all comments (with additional effort involving a post-processing step, the comments could be preserved), but this isn't fully robust and relies on the following assumptions:
- No comment-like constructs appear inside quoted strings.
- Block quotes (
/* ... */
) aren't nested.
(With a more sophisticated regex using balancing group definitions you may be able to overcomes this particular limitation).
# Get the file's content and preprocess it by *removing comments*,
# to prevent GO instances inside them from yielding false positives.
# CAVEAT: This isn't fully robust, but may work well enough in practice.
# See the notes above this code snippet.
$fileContent = (Get-Content -Raw t.txt) -replace '(?m)^\s*--.*' -replace '(?s)/\*.*?\*/'
# Parse the file content into an AST (Abstract Syntax Tree),
# as if it were PowerShell code.
$ast = [System.Management.Automation.Language.Parser]::ParseInput($fileContent, [ref] $null, [ref] $null)
# Get all locations - in terms of line and column number - of isolated,
# unquoted GO tokens.
$locations =
$ast.FindAll({ $args[0].Extent.Text -eq 'go' }, $false) |
Select-Object -ExpandProperty Extent |
Select-Object StartLineNumber, StartColumnNumber -Unique
# Split the file content into batches by the locations of the
# isolated, unquoted GO tokens, resulting in an array of strings
# each representing a batch, stored in $batches.
$thisBatch = ''
$lineNo = $locNdx = 0
[string[]] $batches =
$fileContent -split '\r?\n' | ForEach-Object {
if ( $lineNo -eq $locations[$locNdx].StartLineNumber) {
$fromCol = 0
do {
$thisBatch $_.Substring($fromCol, $locations[$locNdx].StartColumnNumber - $fromCol 2 - 1)
$thisBatch = ''
$fromCol = $locations[$locNdx].StartColumnNumber 2 - 1
} while ($locations[ $locNdx].StartLineNumber -eq $lineNo)
if ($fromCol -lt $_.Length) {
$thisBatch = $_.Substring($fromCol) "`n"
}
} else {
$thisBatch = "$_`n"
}
}
# If the last batch wasn't terminated with a GO, we must add it now.
# Remove "`nGO" if you don't want to append a terminating GO.
if ($thisBatch.Trim()) { $batches = $thisBatch "`nGO" }
# Diagnostic output, to show the resulting batches:
$batches -join "`n-----------------`n"
Output of the above, based on your sample input:
select * from testTbl; GO
-----------------
select * from testTbl2;
GO
-----------------
Declare @go varchar(15) = 'IGo test'
select @go
GO
-----------------
SELECT 'go', ' go ', 'asdv Igo asdsad',
'
go
'
GO
-----------------
create table #Temp
(
IdGo int,
GoId Varchar(50)
)
select * from #Temp
drop table #Temp
GO
Note:
No attempt was made to condense each batch into a single-line representation, but that shouldn't be a problem.
The code also handles multiple batches on a single line correctly, such as the two complete and the one incomplete batch in the following example:
select * from testTbl0;GO select * from testTbl1 GO Declare @go varchar(15) = 'IGo test'
Also, a final batch that happens not to be terminated with
GO
is included.
[1] Note: "GO is not a Transact-SQL statement; it is a command recognized by the sqlcmd
and osql
utilities and SQL Server Management Studio Code editor." - see the docs
The docs also state that a "Transact-SQL statement cannot occupy the same line as a GO command.", which would make the first sample batch in the question technically invalid, but Raj (the OP) reports that it works nonetheless.