Home > Back-end >  Parsing text file into csv. Issue splitting string that spans across multiple lines via PowerShell r
Parsing text file into csv. Issue splitting string that spans across multiple lines via PowerShell r

Time:07-21

I'm attempting to parse a txt file into a csv with 4 headers. However, I'm having some troubles successfully splitting/matching text between two strings that spans across multiple lines. I'm not very experienced with regex. Is there a way to do this without regex? Or does anyone know which combination of regex expressions I could use here?

I started with the following code but welcome any suggestions/recommendations if there's a better way to go about this:

$input = Get-Content ‘C:\Example.txt’
$output = ‘C:\Example.csv’
$regex = "'PRINT(.*?\n)PRINT'"
$regex2 = "'INSERT' -or $_ -eq 'UPDATE'"
$matches = [regex]::Matches($input, $regex)::multiline
$array = @()    

$matches | for each-object{
    $write-object = False
    $obj = New-Object System.Object
        If $_ -match $regex
    }
If ($writeobj){
    $obj | Add-Member -type NoteProperty -name row1 -value $row
    $array  = $obj
}
$array | Export-csv -path $output

DESIRED OUTPUT:

SECTION UPDATE/INSERT TABLE CHANGE BLOCK
AA UPDATE Table_1 COL_1 = ‘123’ WHERE ID= ‘1’
BB INSERT Table_2 (DBO.COL_1) SELECT Col1Value AS [Col_1] FROM Table_1
CC UPDATE Table_1 COL_3
CC INSERT Table_2 (DBO.COL_3) SELECT Col3Value AS [Col_3] FROM Table_2

EXAMPLE .TXT CONTENT:

--AA***************************
--PRINT AA
UPDATE Table_1 
SET COL_1 = '123'
WHERE ID = '1'

--BB***************************
--PRINT BB
INSERT INTO [dbo].[Table_2] (DBO.COL_1)
SELECT Col1Value AS [Col_1] 
FROM  Table_1  

--CC***************************
--PRINT CC
UPDATE Table_1 
SET COL_3 = 'ABC'
WHERE ID = '3' 
AND ID2 = '1a' 

INSERT INTO [dbo].[Table_2] (DBO.COL_3)
SELECT Col3Value AS [Col_3] 
FROM  Table_2

CodePudding user response:

My suggestion is to read the entire file in as a multi-line string using the -Raw switch parameter. Then you can split the file by sections (AA, BB, etc) and parse each section with regex. I do a bunch of -replace to strip out just the info I want for each property here.

$input = Get-Content 'C:\Example.txt'
$output = 'C:\Example.csv'
$array = $input -split '(?ms)(?=--[^\*^\r^\n] ?\*{3,}[^\*])'|%{
    $Record = $_ -replace '(?ms)^--([^\*] )\*. $','$1'
    Switch -RegEx (($_ -split '(?ms)(?=UPDATE|INSERT)')){
        '^UPDATE' {[PSCustomObject]@{SECTION=$Record;'UPDATE/INSERT'='UPDATE';TABLE=$_ -replace '(?ms)UPDATE (. ?)[\r\n].*','$1';'CHANGE BLOCK'=$_ -replace 'UPDATE. ?[\r\n] ' -replace '[\r\n] ',' ' -replace '^SET '}}
        '^INSERT' {[PSCustomObject]@{SECTION=$Record;'UPDATE/INSERT'='INSERT';TABLE=$_ -replace '(?ms)INSERT INTO \S ?\[([^\]] )].*','$1';'CHANGE BLOCK'=$_ -replace '(?ms)INSERT INTO \S ?\[([^\]] )]\s*' -replace '[\r\n] ',' '}}
    }
}
$array | Export-csv -path $output

That provided me with:

SECTION UPDATE/INSERT TABLE    CHANGE BLOCK                                             
------- ------------- -----    ------------                                             
AA      UPDATE        Table_1  COL_1 = '123' WHERE ID = '1'                             
BB      INSERT        Table_2  (DBO.COL_1) SELECT Col1Value AS [Col_1]  FROM  Table_1   
CC      UPDATE        Table_1  COL_3 = 'ABC' WHERE ID = '3'  AND ID2 = '1a'             
CC      INSERT        Table_2  (DBO.COL_3) SELECT Col3Value AS [Col_3]  FROM  Table_2   

CodePudding user response:

I have a similar approach to TheMadTechnician's answer, but different enough and since I did the work I figured I'd post it. Suggest the same thing, read the file as one string and split into chunks and process

$text = Get-Content C:\Example.txt -Raw
$csvpath = 'Some\Path\To\file.csv'

$splitpattern = '--. ?\*{4,}. ?\r?\n\s*'
$mainpattern = '(?s)(?<Action>INSERT|UPDATE)\s.*?(?<Table>\w{1,}_\d{1,})(\s|]|\r?\n)(?<Change>. )'
$sectionpattern = '(?s)--Print (?<Section>. ?)\r?\n(?<Remainder>. )'

$output = $text -split $splitpattern | ForEach-Object {
    if($_ -match $sectionpattern){
        $section,$remainder = $matches.Section,$matches.Remainder

        foreach($block in $remainder -split '(?=UPDATE|INSERT)'){
            if($block -match $mainpattern) {
                [PSCustomObject]@{
                    Section = $section
                    Action  = $matches.Action
                    Table   = $matches.Table
                    Change  = $matches.Change -replace '\r?\n',' '
                }
            }
        }
    }
}

# inspect the output
$output | Out-GridView

# export output
$output | Export-Csv -Path $csvpath -NoTypeInformation

The output as a table

Section Action Table   Change                                                     
------- ------ -----   ------                                                     
AA      UPDATE Table_1  SET COL_1 = '123' WHERE ID = '1'                          
BB      INSERT Table_2  (DBO.COL_1) SELECT Col1Value AS [Col_1]  FROM  Table_1    
CC      UPDATE Table_1  SET COL_3 = 'ABC' WHERE ID = '3'  AND ID2 = '1a'          
CC      INSERT Table_2  (DBO.COL_3) SELECT Col3Value AS [Col_3]  FROM  Table_2
  • Related