Trying to import CSV that has columns of ID, Start Date, End Date where each row could contain the same ID value but different dates.
Want to sort this so it's finding the earliest date for each ID AND the latest End date for that same ID, then write that to a CSV.
Input csv:
ID | Start Date | End Date |
---|---|---|
100 | 01-Apr-2021 | 23-May-2021 |
100 | 10-June-2021 | 30-Dec-2021 |
200 | 12-Aug-2021 | 23-Sep-2021 |
200 | 25-Dec-2021 | 24-May-2022 |
Output csv:
ID | Start Date | End Date |
---|---|---|
100 | 01-Apr-2021 | 30-Dec-2021 |
200 | 12-Aug-2021 | 24-May-2022 |
I thought I could import CSV, then identify ID and then for each ID sort start and end dates but can't seem to get it to work.
Code thus far:
$source = "C:\test\test.csv"
$outPut = "C:\test\Testsort.csv"
$csv = Import-Csv $source
Foreach ($usr in $csv){
$userID= $usr."ID"
$filtered = $csv | Where-Object {($_.'ID') -eq $userID}
$output = [pscustomobject]@{
'ID' = $userID
'Start Date' =($filtered.'Start Date' | Sort-Object @{Expression=
{[datetime]($_."Start Date")};Descending=$false} | Select -First 1
'End Date' =($filtered.'Start Date' | Sort-Object @{Expression=
{[datetime]($_."End Date")};Descending=$True} | Select -Last 1
}
}
$output | Export-csv $outPut
Sadly I'm struggling with the sorting of start and end dates per ID and setting a value of earliest and oldest per ID.
Any thoughts?
CodePudding user response:
Use the
Group-Object
cmdlet to group the rows by sharedID
value.For each group of rows that have the same ID, find the earliest date in the
Start Date
column, and the latest one in theEnd Date
column, then construct an output[pscustomobject]
combining these values.
$source = "C:\test\test.csv"
$output = "C:\test\Testsort.csv"
Import-Csv $source |
Group-Object Id |
ForEach-Object {
$earliestStart = [Linq.Enumerable]::Min([datetime[]] $_.Group.'Start Date')
$latestEnd = [Linq.Enumerable]::Max([datetime[]] $_.Group.'End Date')
[pscustomobject] @{
ID = $_.Name
'Start Date' = $earliestStart.ToString('dd-MMM-yyyy', [cultureinfo]::InvariantCulture)
'End Date' = $latestEnd.ToString('dd-MMM-yyyy', [cultureinfo]::InvariantCulture)
}
} |
Export-csv $output
Note:
While using
Sort-Object
to find the earliest and latest date is an option too, the[System.Linq.Enumerable]::Min()
approach both performs better and requires less memory.Note the use of cast
[datetime[]]
to convert the strings from the CSV input to[datetime]
instances so that finding the chronological minimum / maximum works properly.On output, the min. and max. dates are converted back to the desired date string format, using the
[datetime]
type's.ToString()
method.- Note the use of
[cultureinfo]::InvariantCulture
in order to ensure that English month names without a trailing.
are used; by default, the formatting rules of the current culture are used, as reflected in[cultureinfo]::CurrentCulture
and the automatic$PSCulture
variable
- Note the use of
CodePudding user response:
make use of group-object and custom attributes and you can do it with a one-liner:
$source = "C:\test\test.csv"
$outPut = "C:\test\Testsort.csv"
$csv = Import-Csv $source
$csv|group-object id|select name,@{l='start date';e={$_.group|sort "Start Date"|select -expand "Start Date" -first 1 }},@{l='end date';e={$_.group|sort "end Date"|select -expand "end Date" -last 1 }}|Export-csv $outPut