Home > Software design >  Powershell - Import CSV sort on Date and assign earliest date for each user ID
Powershell - Import CSV sort on Date and assign earliest date for each user ID

Time:11-03

Trying to import CSV that has columns of ID, Start Date, End Date where each row could contain the same ID value but different dates.
Want to sort this so it's finding the earliest date for each ID AND the latest End date for that same ID, then write that to a CSV.

Input csv:

ID Start Date End Date
100 01-Apr-2021 23-May-2021
100 10-June-2021 30-Dec-2021
200 12-Aug-2021 23-Sep-2021
200 25-Dec-2021 24-May-2022

Output csv:

ID Start Date End Date
100 01-Apr-2021 30-Dec-2021
200 12-Aug-2021 24-May-2022

I thought I could import CSV, then identify ID and then for each ID sort start and end dates but can't seem to get it to work.

Code thus far:

$source = "C:\test\test.csv"
$outPut = "C:\test\Testsort.csv"

$csv = Import-Csv $source 
Foreach ($usr in $csv){
    $userID= $usr."ID"
    $filtered = $csv | Where-Object {($_.'ID') -eq $userID}
    $output = [pscustomobject]@{
                     'ID'  = $userID
                     'Start Date'   =($filtered.'Start Date' | Sort-Object @{Expression= 
                      {[datetime]($_."Start Date")};Descending=$false} | Select -First 1
                     'End Date'     =($filtered.'Start Date' | Sort-Object @{Expression= 
                      {[datetime]($_."End Date")};Descending=$True} | Select -Last 1
                     }
}
$output | Export-csv $outPut 

Sadly I'm struggling with the sorting of start and end dates per ID and setting a value of earliest and oldest per ID.

Any thoughts?

CodePudding user response:

  • Use the Group-Object cmdlet to group the rows by shared ID value.

  • For each group of rows that have the same ID, find the earliest date in the Start Date column, and the latest one in the End Date column, then construct an output [pscustomobject] combining these values.

$source = "C:\test\test.csv"
$output = "C:\test\Testsort.csv"

Import-Csv $source | 
    Group-Object Id |
      ForEach-Object { 
        $earliestStart = [Linq.Enumerable]::Min([datetime[]] $_.Group.'Start Date')
        $latestEnd = [Linq.Enumerable]::Max([datetime[]] $_.Group.'End Date')
        [pscustomobject] @{
          ID = $_.Name
          'Start Date' = $earliestStart.ToString('dd-MMM-yyyy', [cultureinfo]::InvariantCulture)
          'End Date' = $latestEnd.ToString('dd-MMM-yyyy', [cultureinfo]::InvariantCulture)
        }
      } |
        Export-csv $output 

Note:

  • While using Sort-Object to find the earliest and latest date is an option too, the [System.Linq.Enumerable]::Min() approach both performs better and requires less memory.

  • Note the use of cast [datetime[]] to convert the strings from the CSV input to [datetime] instances so that finding the chronological minimum / maximum works properly.

  • On output, the min. and max. dates are converted back to the desired date string format, using the [datetime] type's .ToString() method.

CodePudding user response:

make use of group-object and custom attributes and you can do it with a one-liner:

$source = "C:\test\test.csv"
$outPut = "C:\test\Testsort.csv"
$csv = Import-Csv $source 
$csv|group-object id|select name,@{l='start date';e={$_.group|sort "Start Date"|select -expand "Start Date" -first 1 }},@{l='end date';e={$_.group|sort "end Date"|select -expand "end Date" -last 1 }}|Export-csv $outPut 
  • Related