I'm attempting to parse a txt file into a csv with 4 headers. However, I'm having some troubles successfully splitting/matching text between two strings that spans across multiple lines. I'm not very experienced with regex. Is there a way to do this without regex? Or does anyone know which combination of regex expressions I could use here?
I started with the following code but welcome any suggestions/recommendations if there's a better way to go about this:
$input = Get-Content ‘C:\Example.txt’
$output = ‘C:\Example.csv’
$regex = "'PRINT(.*?\n)PRINT'"
$regex2 = "'INSERT' -or $_ -eq 'UPDATE'"
$matches = [regex]::Matches($input, $regex)::multiline
$array = @()
$matches | for each-object{
$write-object = False
$obj = New-Object System.Object
If $_ -match $regex
}
If ($writeobj){
$obj | Add-Member -type NoteProperty -name row1 -value $row
$array = $obj
}
$array | Export-csv -path $output
DESIRED OUTPUT:
SECTION | UPDATE/INSERT | TABLE | CHANGE BLOCK |
---|---|---|---|
AA | UPDATE | Table_1 | COL_1 = ‘123’ WHERE ID= ‘1’ |
BB | INSERT | Table_2 | (DBO.COL_1) SELECT Col1Value AS [Col_1] FROM Table_1 |
CC | UPDATE | Table_1 | COL_3 |
CC | INSERT | Table_2 | (DBO.COL_3) SELECT Col3Value AS [Col_3] FROM Table_2 |
EXAMPLE .TXT CONTENT:
--AA*************************** --PRINT AA UPDATE Table_1 SET COL_1 = '123' WHERE ID = '1' --BB*************************** --PRINT BB INSERT INTO [dbo].[Table_2] (DBO.COL_1) SELECT Col1Value AS [Col_1] FROM Table_1 --CC*************************** --PRINT CC UPDATE Table_1 SET COL_3 = 'ABC' WHERE ID = '3' AND ID2 = '1a' INSERT INTO [dbo].[Table_2] (DBO.COL_3) SELECT Col3Value AS [Col_3] FROM Table_2
CodePudding user response:
My suggestion is to read the entire file in as a multi-line string using the -Raw
switch parameter. Then you can split the file by sections (AA, BB, etc) and parse each section with regex. I do a bunch of -replace
to strip out just the info I want for each property here.
$input = Get-Content 'C:\Example.txt'
$output = 'C:\Example.csv'
$array = $input -split '(?ms)(?=--[^\*^\r^\n] ?\*{3,}[^\*])'|%{
$Record = $_ -replace '(?ms)^--([^\*] )\*. $','$1'
Switch -RegEx (($_ -split '(?ms)(?=UPDATE|INSERT)')){
'^UPDATE' {[PSCustomObject]@{SECTION=$Record;'UPDATE/INSERT'='UPDATE';TABLE=$_ -replace '(?ms)UPDATE (. ?)[\r\n].*','$1';'CHANGE BLOCK'=$_ -replace 'UPDATE. ?[\r\n] ' -replace '[\r\n] ',' ' -replace '^SET '}}
'^INSERT' {[PSCustomObject]@{SECTION=$Record;'UPDATE/INSERT'='INSERT';TABLE=$_ -replace '(?ms)INSERT INTO \S ?\[([^\]] )].*','$1';'CHANGE BLOCK'=$_ -replace '(?ms)INSERT INTO \S ?\[([^\]] )]\s*' -replace '[\r\n] ',' '}}
}
}
$array | Export-csv -path $output
That provided me with:
SECTION UPDATE/INSERT TABLE CHANGE BLOCK
------- ------------- ----- ------------
AA UPDATE Table_1 COL_1 = '123' WHERE ID = '1'
BB INSERT Table_2 (DBO.COL_1) SELECT Col1Value AS [Col_1] FROM Table_1
CC UPDATE Table_1 COL_3 = 'ABC' WHERE ID = '3' AND ID2 = '1a'
CC INSERT Table_2 (DBO.COL_3) SELECT Col3Value AS [Col_3] FROM Table_2
CodePudding user response:
I have a similar approach to TheMadTechnician's answer, but different enough and since I did the work I figured I'd post it. Suggest the same thing, read the file as one string and split into chunks and process
$text = Get-Content C:\Example.txt -Raw
$csvpath = 'Some\Path\To\file.csv'
$splitpattern = '--. ?\*{4,}. ?\r?\n\s*'
$mainpattern = '(?s)(?<Action>INSERT|UPDATE)\s.*?(?<Table>\w{1,}_\d{1,})(\s|]|\r?\n)(?<Change>. )'
$sectionpattern = '(?s)--Print (?<Section>. ?)\r?\n(?<Remainder>. )'
$output = $text -split $splitpattern | ForEach-Object {
if($_ -match $sectionpattern){
$section,$remainder = $matches.Section,$matches.Remainder
foreach($block in $remainder -split '(?=UPDATE|INSERT)'){
if($block -match $mainpattern) {
[PSCustomObject]@{
Section = $section
Action = $matches.Action
Table = $matches.Table
Change = $matches.Change -replace '\r?\n',' '
}
}
}
}
}
# inspect the output
$output | Out-GridView
# export output
$output | Export-Csv -Path $csvpath -NoTypeInformation
The output as a table
Section Action Table Change
------- ------ ----- ------
AA UPDATE Table_1 SET COL_1 = '123' WHERE ID = '1'
BB INSERT Table_2 (DBO.COL_1) SELECT Col1Value AS [Col_1] FROM Table_1
CC UPDATE Table_1 SET COL_3 = 'ABC' WHERE ID = '3' AND ID2 = '1a'
CC INSERT Table_2 (DBO.COL_3) SELECT Col3Value AS [Col_3] FROM Table_2