Home > Software engineering >  PowerShell script to combine CSVs and add root directory as an additional column
PowerShell script to combine CSVs and add root directory as an additional column

Time:12-09

I'm trying to write my first PowerShell script and I'm struggling to get my head around the object orientation and piping approach. So I've got a directory structure like this (Google Takeout Nest directories):

Path\DeviceID-A\
   - Year\
      - Month\
         - year-month-sensors-a.csv
Path\DeviceID-B\
   - Year\
      - Month\
         - year-month-sensors-b.csv
...

I want to aggregate all the CSVs together and I've written a simple script that does this:

Get-ChildItem -Recurse -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv merged.csv -NoTypeInformation -Append

However, I lose the context of the DeviceID when everything is aggregated together. So I'd like to add an additional column that contains those initial root directories, with something similar to:

Import-Csv merged.csv | Select-Object *,@{Name='nest_id';Expression={$nestDeviceID}} | Export-Csv merged-v2.csv -NoTypeInformation

So each row in year-month-sensors-a.csv would have an additional column called nest_id with the value DeviceID-A for any CSV under that root directory.

...but I can't quite reconcile the two scripts on how to programmatically iterate through them, whether I need to use ForEach-Object or piping.

Any advice would be appreciated!

CodePudding user response:

Before posting the code, I think it's worth mentioning, this should work as long as all CSVs have always the same headers / columns. If this was not the case something like this could happen:

@'
Col1,Col2
Val1,Val2
'@ |
ConvertFrom-Csv |
Export-Csv ./test.csv -NoTypeInformation

@'
Col1,NewCol
Val1,NewVal
'@ |
ConvertFrom-Csv |
Export-Csv ./test.csv -NoTypeInformation -Append

Doing this would throw the following Exception:

Cannot append CSV content to the following file: ./test.csv. The appended object does not have a property that corresponds to the following column: Col2. To continue with mismatched properties, add the -Force parameter, and then retry the command.


To answer your question, you're quite close however, I don't think this is possible as a one liner and if it was, I would certainly not recommend it (others reading your code will thank you for that).

By one liner I mean, pipping Get-ChildItem to Import-Csv without using a loop (ForEach-Object or foreach). This is where you lose reference of DeviceID folder name.

Points to clarify:

  • The code will first store the $mergedCsv in memory, appending (-Append) to a Csv on each loop iteration when it's not needed will cause a big amount of unnecessary Disk I/O, hence, not efficient. It might not be important for you in this case, but is something good to know for your future scripts.
  • $file.Directory.Parent.Parent.Name should give you the DeviceID folders name, please correct if I'm wrong.
  • Each loop iteration will be importing each CSV and appending the new Nest_ID column which contains the DeviceID folder name.
$mergedCsv = foreach($file in Get-ChildItem -Recurse -Filter *.csv)
{
    $nestDeviceID = $file.Directory.Parent.Parent.Name
    $csv = Import-Csv $file.FullName
    $csv | Select-Object *, @{
        Name = 'Nest_ID'
        Expression = { $nestDeviceID }
    }
}

# Inspect the Merged CSV
$mergedCsv | Out-GridView

# Export it
$mergedCsv | Export-Csv merged.csv -NoTypeInformation
  • Related