Slowness to Remove 3,7 and 9 column from | separated txt file using PowerShell-CodePudding

I have Pipe separated data file with huge data and i want to remove 3,7, and 9 column. below script is working 100% fine. but its too slow its taking 5 mins for 22MB file.

Adeel|01|test|1234589|date|amount|00|123345678890|test|all|01| Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00| Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00| Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00| Adeel|05|test|1234589|date|amount|00|123345678890|test|all|05| Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00| Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00| Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00| Adeel|09|test|1234589|date|amount|00|123345678890|test|all|09| Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00| Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00| Adeel|12|test|1234589|date|amount|00|123345678890|test|all|12|

    param
(
    # Input data file
    [string]$Path = 'O:\Temp\test.txt',
    # Columns to be removed, any order, dupes are allowed
    [int[]]$Remove = (3,6)
)

# sort indexes descending and remove dupes
$Remove = $Remove | Sort-Object -Unique -Descending

# read input lines
Get-Content $Path | .{process{
    # split and add to ArrayList which allows to remove items
    $list = [Collections.ArrayList]($_ -split '\|')

    # remove data at the indexes (from tail to head due to descending order)
    foreach($i in $Remove) {
        $list.RemoveAt($i)
    }

    # join and output
    #$list -join '|'
    $contentUpdate=$list -join '|'
    Add-Content "O:\Temp\testoutput.txt" $contentUpdate
}
}

CodePudding user response：

Get-Content is comparatively slow. Use of the pipeline adds additional overhead.

When performance matters, StreamReader and StreamWriter can be a better choice:

param (
    # Input data file
    [string] $InputPath = 'input.txt',
    # Output data file
    [string] $OutputPath = 'output.txt',
    # Columns to be removed, any order, dupes are allowed
    [int[]] $Remove = (1, 2, 2),
    # Column separator
    [string] $Separator = '|',
    # Input file encoding
    [Text.Encoding] $Encoding = [Text.Encoding]::UTF8
)

$ErrorActionPreference = 'Stop'

# Gets rid of dupes and provides fast lookup ability
$removeSet = [Collections.Generic.HashSet[int]] $Remove

$reader = $writer = $null

try {
    $reader = [IO.StreamReader]::new(( Convert-Path -LiteralPath $InputPath ), $encoding )

    $null = New-Item $OutputPath -ItemType File -Force  # as Convert-Path requires existing path

    while( $line = $reader.ReadLine() ) {

        if( -not $writer ) {
            # Construct writer only after first line has been read, so $reader.CurrentEncoding is available 
            $writer = [IO.StreamWriter]::new(( Convert-Path -LiteralPath $OutputPath ), $false, $reader.CurrentEncoding )
        }

        $columns = $line.Split( $separator )
        $isAppend = $false

        for( $i = 0; $i -lt $columns.Length; $i   ) {
            if( -not $removeSet.Contains( $i ) ) {
                if( $isAppend ) { $writer.Write( $separator ) }
                $writer.Write( $columns[ $i ] )
                $isAppend = $true
            }
        }

        $writer.WriteLine()  # Write (CR)LF
    }
}
finally {
    # Make sure to dispose the reader and writer so files get closed.
    if( $writer ) { $writer.Dispose() }
    if( $reader ) { $reader.Dispose() }
}

Convert-Path is used because .NET has a different current directory than PowerShell, so it's best practice to pass absolute paths to .NET API.
If this still isn't fast enough, consider writing this in C# instead. Especially with such "low level" code, C# tends to be faster. You may embed C# code in PowerShell using Add-Type -TypeDefinition $csCode.
As another optimization, instead of using String.Split() which creates more sub strings than actually needed, you may use String.IndexOf() and String.Substring() to only extract the necessary columns.
Last not least, you may experiment with StreamReader and StreamWriter constructors that lets you allow to specify a buffer size.