Home > Blockchain >  Splitting SQL file/String into batches using PowerShell exclude split when content is in single quot
Splitting SQL file/String into batches using PowerShell exclude split when content is in single quot

Time:12-01

I'm splitting my SQL file into batches based on the String GO. I took stackoverflow thread How do I split file on batches using powershell as reference And noticed this regex doesn't work for few scenarios

  • If the go is found in single quotes the string gets split. (I want to avoid the split for any text that are inside single quotes)
  • And another scenario is when I use go on Declare statement like Declare @go

I'm not familiar with regex patterns. So, I tried to search some online documentation on regex and came up with the pattern to find anything within single quotes and how to ignore the @go in file when splitting. Below are the regex

  • ('([\s\S]*?)') - gets me the string that is with in single quotes, but I'm not sure how to add this match to exclude to current regex pattern
  • (?<![\S])(?:\bGO\b) - Can avoid splitting when any non Whitespace char is before the GO string that took care of @go in below sql file

SQL file content:

select * from testTbl; GO
select * from testTbl2;
GO

Declare @go varchar(15) = 'IGo test'

select @go

GO

SELECT 'go', '   go  ', 'asdv Igo asdsad',
'
go
'

GO

create table #Temp
(
    IdGo int, 
    GoId Varchar(50)
)

select * from #Temp

drop table #Temp

GO

PowerShell script line:

$batches = ( $scriptData -split "(?:\bGO\b)" ) | % { $_   "`r`nGO" }

Note: var $scriptData has the SQL file content in it.

Is this a correct approach or how can we exclude the split when the string is with in single quotes? Is there any cleaner approach to do this?

FYI: I'll update the other thread answer, Once I can figure out the solution to this. Or I'm happy to update the other thread and delete this if any one feels it is a duplicate.

Updated: Desired output:

select * from testTbl;
GO
select * from testTbl2;
GO
Declare @go varchar(15) = 'IGo test' select @go
GO
SELECT 'go', ' go ', 'asdv Igo asdsad','go'
GO
create table #Temp ( IdGo int, GoId Varchar(50) ) select * from #Temp drop table #Temp
GO

CodePudding user response:

To robustly parse your input into batches, you need a language parser that can reliably identify syntactic elements - regexes are not sophisticated enough to model the grammar of your input.

In the absence of a T-SQL parser,[1] you can probably get away with PowerShell's own language parser, [System.Management.Automation.Language.Parser], given that there are high-level commonalities between the two languages, so it should be able to identify isolated, non-@-prefixed GO tokens in your input:

Caveat re commment support:

  • Because T-SQL's comment constructs differ from PowerShell, using the PowerShell parser would yield false positives for (isolated) GO substrings inside comments.

  • The solution below therefore uses regex-based preprocessing that removes all comments (with additional effort involving a post-processing step, the comments could be preserved), but this isn't fully robust and relies on the following assumptions:

    • No comment-like constructs appear inside quoted strings.
    • Block quotes (/* ... */) aren't nested.
      (With a more sophisticated regex using balancing group definitions you may be able to overcomes this particular limitation).
# Get the file's content and preprocess it by *removing comments*,
# to prevent GO instances inside them from yielding false positives.
# CAVEAT: This isn't fully robust, but may work well enough in practice.
#         See the notes above this code snippet.
$fileContent = (Get-Content -Raw t.txt) -replace '(?m)^\s*--.*' -replace '(?s)/\*.*?\*/'

# Parse the file content into an AST (Abstract Syntax Tree),
# as if it were PowerShell code.
$ast = [System.Management.Automation.Language.Parser]::ParseInput($fileContent, [ref] $null, [ref] $null)

# Get all locations - in terms of line and column number - of isolated,
# unquoted GO tokens.
$locations = 
  $ast.FindAll({ $args[0].Extent.Text -eq 'go' }, $false) | 
    Select-Object -ExpandProperty Extent |
      Select-Object StartLineNumber, StartColumnNumber -Unique

# Split the file content into batches by the locations of the 
# isolated, unquoted GO tokens, resulting in an array of strings
# each representing a batch, stored in $batches.
$thisBatch = ''
$lineNo = $locNdx = 0
[string[]] $batches =
  $fileContent -split '\r?\n' | ForEach-Object {
    if (  $lineNo -eq $locations[$locNdx].StartLineNumber) {
      $fromCol = 0
      do {
        $thisBatch   $_.Substring($fromCol, $locations[$locNdx].StartColumnNumber - $fromCol   2 - 1)
        $thisBatch = ''
        $fromCol = $locations[$locNdx].StartColumnNumber   2 - 1
      } while ($locations[  $locNdx].StartLineNumber -eq $lineNo)
      if ($fromCol -lt $_.Length) {
        $thisBatch =  $_.Substring($fromCol)   "`n"
      }
    } else {
      $thisBatch  = "$_`n"
    }
  }
# If the last batch wasn't terminated with a GO, we must add it now.
# Remove   "`nGO" if you don't want to append a terminating GO.
if ($thisBatch.Trim()) { $batches  = $thisBatch   "`nGO" }

# Diagnostic output, to show the resulting batches:
$batches -join "`n-----------------`n"

Output of the above, based on your sample input:

select * from testTbl; GO
-----------------
select * from testTbl2;
GO
-----------------

Declare @go varchar(15) = 'IGo test'

select @go

GO
-----------------

SELECT 'go', '   go  ', 'asdv Igo asdsad',
'
go
'

GO
-----------------

create table #Temp
(
    IdGo int, 
    GoId Varchar(50)
)

select * from #Temp

drop table #Temp

GO

Note:

  • No attempt was made to condense each batch into a single-line representation, but that shouldn't be a problem.

  • The code also handles multiple batches on a single line correctly, such as the two complete and the one incomplete batch in the following example:

    select * from testTbl0;GO select * from testTbl1 GO Declare @go varchar(15) = 'IGo test'
    
  • Also, a final batch that happens not to be terminated with GO is included.


[1] Note: "GO is not a Transact-SQL statement; it is a command recognized by the sqlcmd and osql utilities and SQL Server Management Studio Code editor." - see the docs
The docs also state that a "Transact-SQL statement cannot occupy the same line as a GO command.", which would make the first sample batch in the question technically invalid, but Raj (the OP) reports that it works nonetheless.

  • Related