Home > Mobile >  Map columns to new column names based on mapping file
Map columns to new column names based on mapping file

Time:03-05

I have a set of data in CSV format, and I'm trying to output the same data with different headers based on column names in a mapping document. The column names below are just an example for clarity, but the names can be drastically different, and the order may not match the order of the columns in the mapping file. I haven't done much Powershell work lately, so I'm having a tough time getting started. I can't seem to figure out the logic for looking up column names based on the data I'm looking at. For instance, if I do a foreach ($row in $source) how would I know how to order the columns? Any assistance would be appreciated!

Examples: map.csv

SourceColumn,TargetColumn
LastName,Last_Name
DOB,Date_Of_Birth
FirstName,First_Name

source.csv

FirstName,LastName,DOB
Tom,Jones,06/07/1940
Bill,Nye,11/27/1955
William,Shakespeare,04/01/1564

Desired output: output.csv

Last_Name,Date_Of_Birth,First_Name
Jones,Tom,06/07/1940
Nye,Bill,11/27/1955
Shakespeare,William,04/01/1564

CodePudding user response:

Note: The following preserves the column order in source.csv and simply substitutes new column names. This enables relatively fast plain-text processing of the data CSV.

# Initialize an ordered hashtable.
$map = [ordered] @{}
# Fill it with the mapping of source to target column names.
Import-Csv map.csv | ForEach-Object { $map[$_.SourceColumn] = $_.TargetColumn }

# Read the data CSV into header line and all data lines.
# Note: You can speed this up with a somewhat obscure optimization:
#       $sourceHeader, $sourceData = (Get-Content -ReadCount 0 source.csv)
$sourceHeader, $sourceData = Get-Content source.csv

# Construct the new header line via the previously constructed
# map (ordered hashtable).
$newSourceHeader = ($sourceHeader -split ',' | ForEach-Object { $map[$_] }) -join ','

# Save the new header followed by the data lines to a (new) output
# file. Adjust the target file and -Encoding as needed.
Set-Content -Encoding utf8 output.csv -Value $newSourceHeader, $sourceData

If you do need the specific column ordering in map.csv, an inefficient, but simple solution is to apply the following to the output.csv file that resulted from the code above:

Import-Csv output.csv |
  Select-Object (Import-Csv map.csv).TargetColumn |
    Export-Csv output_new.csv -Encoding utf8 -NoTypeInformation

CodePudding user response:

Build a hash between the two sets of headers, using the Source's column as the key and the destination column as the value:

$myHash = @{
    "LastName" = "Last_Name"
}

$myHash[$key]

produces the value for the destination.

CodePudding user response:

Here is one way you could do it looping through the object's PSObject.Properties, this should work even if the reference table has only a specific set of new column names and you want to also keep those columns not in the reference table.

$refTable = @'
SourceColumn,TargetColumn
LastName,Last_Name
DOB,Date_Of_Birth
FirstName,First_Name
'@ | ConvertFrom-Csv

$source = @'
FirstName,LastName,DOB,NotInRefCol
Tom,Jones,06/07/1940,1
Bill,Nye,11/27/1955,2
William,Shakespeare,04/01/1564,3
'@ | ConvertFrom-Csv

$map = @{}
foreach($line in $refTable) {
    $map[$line.SourceColumn] = $line.TargetColumn
}

foreach($line in $source) {
    $out = [ordered]@{}
    foreach($prop in $line.PSObject.Properties) {
        if($newCol = $map[$prop.Name]) {
            $out[$newCol] = $prop.Value
            continue
        }
        $out[$prop.Name] = $prop.Value
    }
    [pscustomobject]$out
}

Result of the example above would be:

First_Name Last_Name   Date_Of_Birth NotInRefCol
---------- ---------   ------------- -----------
Tom        Jones       06/07/1940    1
Bill       Nye         11/27/1955    2
William    Shakespeare 04/01/1564    3

CodePudding user response:

Something along these lines would work:

$map = [ordered] @{}
Import-Csv C:\scripts\map.csv | ForEach-Object { $map[$_.SourceColumn] = $_.TargetColumn }  
$source = Get-Content C:\scripts\source.csv
# Replace names in the 1st line of your csv.
$source[0] = ($source[0] -split ',' | ForEach-Object { $map[$_] }) -join ','
$source | Out-File -FilePath C:\scripts\output.csv

Ensures only your header line is updated (no accidental values in other cells that may match header names) Obviously not a problem for your current data set but could be for other data sets. Though my output does keep your header in the order you input it. Rather than the output example you have that changed the order of your columns.

  • Related