Home > database >  Powershell: Create table with extracted values from between tags without any white spaces (seperated
Powershell: Create table with extracted values from between tags without any white spaces (seperated

Time:11-14

I have a text file that looks like the below which contains a table at line 8. Now the table can vary in length, but it always contains 4 values per line.

Input file:

0001117945
14102022
0001056.98
GBP
0000000.00
0000000.00
\\GLORSAWA01\EHIShared\Remittance\UK01\UKI_REM_COL58652cbc13ca49aabf000.pdf                                                                                                                                                                                         
<LineItemTable><LINEITEM><LINEITEMFIELD>20220916</LINEITEMFIELD><LINEITEMFIELD>2525636         </LINEITEMFIELD><LINEITEMFIELD>0.00                            </LINEITEMFIELD><LINEITEMFIELD>246.05                          </LINEITEMFIELD></LINEITEM><LINEITEM><LINEITEMFIELD>20220920</LINEITEMFIELD><LINEITEMFIELD>2527541         </LINEITEMFIELD><LINEITEMFIELD>0.00                            </LINEITEMFIELD><LINEITEMFIELD>450.12                          </LINEITEMFIELD></LINEITEM><LINEITEM><LINEITEMFIELD>20220922</LINEITEMFIELD><LINEITEMFIELD>2531147         </LINEITEMFIELD><LINEITEMFIELD>0.00                            </LINEITEMFIELD><LINEITEMFIELD>360.81                          </LINEITEMFIELD></LINEITEM></LineItemTable>

The structure of the table in line 8 is always the same and looks like the below when (formatted just for better visibility).

<LineItemTable>
    <LINEITEM>
        <LINEITEMFIELD>20220916</LINEITEMFIELD>
        <LINEITEMFIELD>2525636         </LINEITEMFIELD>
        <LINEITEMFIELD>0.00                            </LINEITEMFIELD>
        <LINEITEMFIELD>246.05                          </LINEITEMFIELD>
    </LINEITEM>
    <LINEITEM>
        <LINEITEMFIELD>20220920</LINEITEMFIELD>
        <LINEITEMFIELD>2527541         </LINEITEMFIELD>
        <LINEITEMFIELD>0.00                            </LINEITEMFIELD>
        <LINEITEMFIELD>450.12                          </LINEITEMFIELD>
    </LINEITEM>
    <LINEITEM>
        <LINEITEMFIELD>20220922</LINEITEMFIELD>
        <LINEITEMFIELD>2531147         </LINEITEMFIELD>
        <LINEITEMFIELD>0.00                            </LINEITEMFIELD>
        <LINEITEMFIELD>360.81                          </LINEITEMFIELD>
    </LINEITEM>
</LineItemTable>

I'm trying to extract all the values from the text file and writing them into another file but I want to keep the first 7 lines as is and then have the values from line 8 displayed in multiple lines seperated by ; without any white spaces.

Desired output file:

0001117945
14102022
0001056.98
GBP
0000000.00
0000000.00
\\GLORSAWA01\EHIShared\Remittance\UK01\UKI_REM_COL58652cbc13ca49aabf000.pdf                                                                                                                                                                                         
20220916;2525636;0.00;246.05
20220920;2527541;0.00;450.12
20220922;2531147;0.00;360.81

This is how far I got but I cannot get the table from line 8 converted into my desired output. Any help would be greatly appreciated.

$importfolder = ".\PowerShell_script\" 
$outputfolder = ".\PowerShell_script\Output\"
$files = ".\PowerShell script\*.txt" 
$list = Get-ChildItem -Path $files | select Name


$find1 = "<LineItemTable><LINEITEM><LINEITEMFIELD>"
$find2 = "</LINEITEMFIELD><LINEITEMFIELD>" 
$find3 = "</LINEITEMFIELD></LINEITEM><LINEITEM><LINEITEMFIELD>"

$replace1 = ""
$replace2 = "`t" 
$replace3 = "`n"

ForEach($file in $list){
  
    echo $file.Name
    $filename = $file.Name
    $file = $importfolder   $file.Name
    $outputfile = $outputfolder   $filename

    $filecontent = Get-Content $file | 
    ForEach-Object { 
        if($_ -Match $find1)
            {$_ -replace $replace1}         

        if($_ -Match $find2)
            {$_ -replace $replace2} 

        if($_ -Match $find3)
            {$_ -replace $replace3} 

        else {$_} # output the line as is
     } | Set-Content $outputfile
}

CodePudding user response:

It looks like this logic works properly for the example file in question:

$file = Get-Content path\to\examplefile.txt
$(
    $file[0..($file.Length - 2)]
    ($file[-1] -as [xml]).SelectNodes('*/LINEITEM').ForEach{
        $_.LINEITEMFIELD.Trim() -join ';'
    }
)

Basically, leave the all lines up until the one before last as they are and the last line treat it as an XML.

Implementing this in your loop, the code would look like this:

foreach($file in $list) {
    $filename   = $file.Name
    $file       = $importfolder   $file.Name
    $outputfile = $outputfolder   $filename

    $content = Get-Content $file
    $(
        $content[0..($content.Length - 2)]
        ($content[-1] -as [xml]).SelectNodes('*/LINEITEM').ForEach{
            $_.LINEITEMFIELD.Trim() -join ';'
        }
    ) | Set-Content $outputfile
}
  • Related