Performance Enhancement in CSV value change though powershell-CodePudding

Please reference earlier the question : How to get replace by id (Id value get from XML) and put into CSV, powershell

I need add ForEach-Object instead of ForEach, I have 11,000 rows. So, it takes too much time.

foreach ($csvRow in $csvRows) {
  foreach ($columnName in $columnNames) {
    $id = $csvRow.$columnName
    if (-not $id) { continue }
    $newId = @($xmlDoc.enum_types.enum_type.Where( { $_.field_name -eq $columnName }, 'First').
               items).ForEach( { $_.item }).Where( { $_.id -eq $id }).value
        $csvRow.$columnName = $newId
  }
}
$csvRows | Export-Csv -NoTypeInformation -Encoding utf8 $CSVpath

CodePudding user response：

One thing you could do is pre-compute your lookups from the "enums" xml in your other question.

Original code - 7649 milliseconds

# setup

$xml = @"
<enum_types>
  <enum_type field_name="Test1">
    <items>
      <item>
        <id>1</id>
        <value>A</value>
      </item>
    </items>
  </enum_type>
  <enum_type field_name="Test2">
    <items>
      <item>
        <id>1</id>
        <value>A</value>
      </item>
    </items>
  </enum_type>
</enum_types>
"@;

$enums = [xml] $xml;

$csv = @("Test1, Test2");
$csv  = 1..110000 | % { "1, 1" }

$data = $csv | ConvertFrom-Csv

$columnNames = @( "Test1", "Test2" );

# perf test - 7649 milliseconds
measure-command -expression {
    foreach ($csvRow in $data)
    {
        foreach ($columnName in $columnNames)
        {
            $id = $csvRow.$columnName
            if ( -not $id )
            {
                continue;
            }
            $newId = @($enums.enum_types.enum_type.Where( { $_.field_name -eq $columnName }, 'First').
                items).ForEach( { $_.item }).Where( { $_.id -eq $id }).value
            $csvRow.$columnName = $newId
        }
    }
}

With pre-computed lookups - 792 milliseconds

# setup 

... as above ...

# pre-compute lookups

$lookups = @{};
foreach( $enum_type in $enums.enum_types.enum_type )
{
    $lookups[$enum_type.field_name] = @{}
    foreach( $item in $enum_type.items.item )
    {
        $lookups[$enum_type.field_name][$item.id] = $item.value
    }
}
write-host ($lookups | convertto-json -depth 99)
# {
#   "Test1": {
#     "1": "A"
#   },
#   "Test2": {
#     "1": "A"
#   }
# }

# perf test - 792 milliseconds
measure-command -expression {
    foreach ($csvRow in $data)
    {
        foreach ($columnName in $columnNames)
        {
            $id = $csvRow.$columnName
            if (-not $id)
            {
                continue;
            }
            $csvRow.$columnName = $lookups[$columnName][$id]
        }
    }
}

There's possibly more you could squeeze out of this approach, but it's already a 10x speedup (in my very limited benchmarks).

CodePudding user response：

Preload all the new items into hash tables:

$Types = $xmlDoc.enum_types.enum_type
$HashTable = @{}

Import-Csv .\1.Csv |ForEach-Object { $Names = $Null } {
    if (!$Names) {
        $Names = $_.psobject.Properties.Name
        foreach ($Name in $Names) {
            $HashTable[$Name] = @{}
            foreach ($Field in $Types.Where{$_.field_name -eq $Name}) {     
                foreach ($Item in $Field.Items.Item) {
                    $HashTable[$Name][$Item.Id] = $Item.Value
                }
            }
        }
    }
    foreach ($Name in $Names) {
        $id = $_.$Name
        If ($HashTable[$Name].Contains($Id)) {
            $_.$Name = $HashTable[$Name][$Id]
        }
    }
    $_
} |Export-Csv .\1New.Csv -NoTypeInformation -Encoding utf8