I need some help in parsing the. CSV like in the attached and then generating the summary report based on the frequency of the specific keywords in the message_subject column like 'Auto Reply:'
The expected Result is:
Sender_Address, Frequency, Date (Ascending sorted)
[email protected], 3, 2022-09-27, 2022-09-28
[email protected], 2, 2022-10-04, 2022-09-30
[email protected], 1, 2022-10-06
How to achieve that?
CodePudding user response:
Use Group-Object
to group the rows by sender address, then use Sort-Object -Unique
to grab all unique dates:
Import-Csv path\to\file.csv |Group-Object sender_address |ForEach-Object {
# Parse timestamps, grab unique dates
$uniqueDates = $_.Group |ForEach-Object { [datetime]::Parse($_.origin_timestamp_utc).Date.ToString('yyyy-MM-dd') } |Sort-Object -Unique
# Create a new object to hold the frequency info
[pscustomobject]@{
Sender_address = $_.Name
Frequency = $uniqueDates.Count
Dates = $uniqueDates -join ', '
}
} |Export-Csv path\to\result.csv -NoTypeInformation
CodePudding user response:
Here you go, by using group-object
to group by Sender_Address and then using Select-object
with calculated properties to count the keyWord 'Auto Reply' and gather the related dates (sorted):
#Load csv
$csv = import-csv [path]
#Calculate output
$csv | Group-Object -Property 'sender_address' | Select-Object Name,@{name='keyWordCount';expression={($_.group.'message_subject' -match 'Auto Reply').count}},@{name='origin_timestamp_utc';expression={(($_.group | ?{$_.'message_subject' -match 'Auto Reply'}).'origin_timestamp_utc' | Sort-Object) -join ","}}
#Output
Name keyWordCount origin_timestamp_utc
---- ------------ --------------------
[email protected] 3 2022-09-27T19:38:02.2054022Z,2022-09-28T19:38:03.0335704Z,2022-09-28T19:38:03.06298…
[email protected] 2 2022-09-30T19:38:02.8562799Z,2022-10-04T02:33:07.6877938Z
[email protected] 1 2022-10-06T23:30:12.3284393Z
CodePudding user response:
Here is my take on the problem:
# Create sample data
$csv = ConvertFrom-Csv @'
origin_timestamp_utc,sender_address,message_subject,directionality,connector_id,delivery_priority
2022-09-27T19:38:02.1826859Z,[email protected],Customer Vehicle Conditions report for 20220927,Originating,,Normal
2022-09-27T19:38:02.2054022Z,[email protected],Auto Reply: Vehicle Reports are due now.,Originating,,Normal
2022-09-28T19:38:03.0335704Z,[email protected],Auto Reply: Late in the office.,Originating,,Normal
2022-09-28T19:38:03.0629852Z,[email protected],Auto Reply: Sick not well :-(,Originating,,Normal
2022-10-05T19:38:02.1819402Z,[email protected],Customer Vehicle Conditions report for 20221005,Originating,,Normal
2022-10-05T19:38:02.1903258Z,[email protected],Customer Vehicle Conditions report for 20221005,Originating,,Normal
2022-10-06T19:38:02.2743619Z,[email protected],Customer Vehicle Conditions report for 20221006,Originating,,Normal
2022-10-06T19:38:02.2821327Z,[email protected],Customer Vehicle Conditions report for 20221006,Originating,,Normal
2022-09-29T19:38:02.1361255Z,[email protected],Customer Vehicle Conditions report for 20220929,Originating,,Normal
2022-09-29T19:38:02.1453714Z,[email protected],Customer Vehicle Conditions report for 20220929,Originating,,Normal
2022-09-30T19:38:02.8562799Z,[email protected],Auto Reply: Out of office today?!??!,Originating,,Normal
2022-09-30T19:38:02.8979890Z,[email protected],Customer Vehicle Conditions report for 20220930,Originating,,Normal
2022-10-03T19:38:02.5937670Z,[email protected],Customer Vehicle Conditions report for 20221003,Originating,,Normal
2022-10-04T02:33:07.6877938Z,[email protected],Auto Reply: Not in the Office ?,Originating,,Normal
2022-10-04T19:38:02.6117533Z,[email protected],Customer Vehicle Conditions report for 20221004,Originating,,Normal
2022-10-01T19:38:03.3718620Z,[email protected],Customer Vehicle Conditions report for 20221001,Originating,,Normal
2022-10-01T19:38:04.8114745Z,[email protected],Customer Vehicle Conditions report for 20221001,Originating,,Normal
2022-10-02T19:38:02.2517865Z,[email protected],Customer Vehicle Conditions report for 20221002,Originating,,Normal
2022-10-02T19:38:02.2855767Z,[email protected],Customer Vehicle Conditions report for 20221002,Originating,,Normal
2022-10-03T19:38:02.5937670Z,[email protected],Customer Vehicle Conditions report for 20221003,Originating,,Normal
2022-10-04T19:38:02.6383853Z,[email protected],Customer Vehicle Conditions report for 20221004,Originating,,Normal
2022-10-06T23:30:12.3284393Z,[email protected],Auto Reply: Where are you !!!!,Originating,,Normal
'@
$keyWord = 'Auto Reply:*'
$csv | Group-Object sender_address | ForEach-Object {
# From current group, get all date/times where the message_subject matches the keyword
$dates = $_.Group.Where{ $_.message_subject -like $keyWord }.ForEach{ $_.origin_timestamp_utc }
# Remove the time part and get unique dates
$uniqueDates = $dates -replace 'T. ' | Sort-Object -Unique
# Output
[PSCustomObject]@{
Sender_Address = $_.Name
Frequency = $dates.Count
'Date (Ascending sorted)' = $uniqueDates -join ', '
}
}
Output:
Sender_Address Frequency Date (Ascending sorted)
-------------- --------- -----------------------
[email protected] 3 2022-09-27, 2022-09-28
[email protected] 2 2022-09-30, 2022-10-04
[email protected] 1 2022-10-06
Compared to the expected output, there is a slight difference in the Date
column, 2nd row. I have interpreted the "ascending sorted" to be applied to each row individually. I'm not sure if OP actually meant to sort the whole column (possibly by newest date).
CodePudding user response:
This solves your problem, including ascending sorting for the dates:
# gets all messages starting with "Auto Reply"
$messages = Import-Csv <your file> | Where-Object message_subject -match "Auto Reply.*"
# changes timestamp format to date only
$messages | ForEach-Object { $_.origin_timestamp_utc = [datetime]::Parse($_.origin_timestamp_utc).Date.ToString('yyyy-MM-dd') }
#groups by sender_adress
$messages | Group-Object sender_address | ForEach-Object {
$sortedDates = ""
# sorts dates so the date with the least amount of messages comes first (you can change this with Sort-Object -descending)
$_.group | Group-Object origin_timestamp_utc | Sort-Object count | ForEach-Object {
$sortedDates = $_.Name ","
}
[pscustomobject]@{
Sender_Address = $_.Name
Frequency = $_.Count
"Date (Ascending Sorted)" = $sortedDates.TrimEnd(",") #remove last ","
}
}
Add Export-Csv
to export it