Home > Net >  Parsing CSV and then count the occurrence by date?
Parsing CSV and then count the occurrence by date?

Time:10-09

I need some help in parsing the. CSV like in the attached and then generating the summary report based on the frequency of the specific keywords in the message_subject column like 'Auto Reply:'

Source Input.CSV: enter image description here

The expected Result is:

Sender_Address,        Frequency, Date (Ascending sorted)
[email protected], 3,         2022-09-27, 2022-09-28
[email protected], 2,         2022-10-04, 2022-09-30
[email protected], 1,         2022-10-06

How to achieve that?

CodePudding user response:

Use Group-Object to group the rows by sender address, then use Sort-Object -Unique to grab all unique dates:

Import-Csv path\to\file.csv |Group-Object sender_address |ForEach-Object {
  # Parse timestamps, grab unique dates
  $uniqueDates = $_.Group |ForEach-Object { [datetime]::Parse($_.origin_timestamp_utc).Date.ToString('yyyy-MM-dd') } |Sort-Object -Unique

  # Create a new object to hold the frequency info
  [pscustomobject]@{
    Sender_address = $_.Name
    Frequency = $uniqueDates.Count
    Dates = $uniqueDates -join ', '
  }
} |Export-Csv path\to\result.csv -NoTypeInformation

CodePudding user response:

Here you go, by using group-object to group by Sender_Address and then using Select-object with calculated properties to count the keyWord 'Auto Reply' and gather the related dates (sorted):

#Load csv
$csv = import-csv [path]

#Calculate output
$csv | Group-Object -Property 'sender_address' | Select-Object Name,@{name='keyWordCount';expression={($_.group.'message_subject' -match 'Auto Reply').count}},@{name='origin_timestamp_utc';expression={(($_.group | ?{$_.'message_subject' -match 'Auto Reply'}).'origin_timestamp_utc' | Sort-Object) -join ","}}

#Output
Name                  keyWordCount origin_timestamp_utc
----                  ------------ --------------------
[email protected]            3 2022-09-27T19:38:02.2054022Z,2022-09-28T19:38:03.0335704Z,2022-09-28T19:38:03.06298…
[email protected]            2 2022-09-30T19:38:02.8562799Z,2022-10-04T02:33:07.6877938Z
[email protected]            1 2022-10-06T23:30:12.3284393Z

CodePudding user response:

Here is my take on the problem:

# Create sample data
$csv = ConvertFrom-Csv @'
origin_timestamp_utc,sender_address,message_subject,directionality,connector_id,delivery_priority
2022-09-27T19:38:02.1826859Z,[email protected],Customer Vehicle Conditions report for 20220927,Originating,,Normal
2022-09-27T19:38:02.2054022Z,[email protected],Auto Reply: Vehicle Reports are due now.,Originating,,Normal
2022-09-28T19:38:03.0335704Z,[email protected],Auto Reply: Late in the office.,Originating,,Normal
2022-09-28T19:38:03.0629852Z,[email protected],Auto Reply: Sick not well :-(,Originating,,Normal
2022-10-05T19:38:02.1819402Z,[email protected],Customer Vehicle Conditions report for 20221005,Originating,,Normal
2022-10-05T19:38:02.1903258Z,[email protected],Customer Vehicle Conditions report for 20221005,Originating,,Normal
2022-10-06T19:38:02.2743619Z,[email protected],Customer Vehicle Conditions report for 20221006,Originating,,Normal
2022-10-06T19:38:02.2821327Z,[email protected],Customer Vehicle Conditions report for 20221006,Originating,,Normal
2022-09-29T19:38:02.1361255Z,[email protected],Customer Vehicle Conditions report for 20220929,Originating,,Normal
2022-09-29T19:38:02.1453714Z,[email protected],Customer Vehicle Conditions report for 20220929,Originating,,Normal
2022-09-30T19:38:02.8562799Z,[email protected],Auto Reply: Out of office today?!??!,Originating,,Normal
2022-09-30T19:38:02.8979890Z,[email protected],Customer Vehicle Conditions report for 20220930,Originating,,Normal
2022-10-03T19:38:02.5937670Z,[email protected],Customer Vehicle Conditions report for 20221003,Originating,,Normal
2022-10-04T02:33:07.6877938Z,[email protected],Auto Reply: Not in the Office ?,Originating,,Normal
2022-10-04T19:38:02.6117533Z,[email protected],Customer Vehicle Conditions report for 20221004,Originating,,Normal
2022-10-01T19:38:03.3718620Z,[email protected],Customer Vehicle Conditions report for 20221001,Originating,,Normal
2022-10-01T19:38:04.8114745Z,[email protected],Customer Vehicle Conditions report for 20221001,Originating,,Normal
2022-10-02T19:38:02.2517865Z,[email protected],Customer Vehicle Conditions report for 20221002,Originating,,Normal
2022-10-02T19:38:02.2855767Z,[email protected],Customer Vehicle Conditions report for 20221002,Originating,,Normal
2022-10-03T19:38:02.5937670Z,[email protected],Customer Vehicle Conditions report for 20221003,Originating,,Normal
2022-10-04T19:38:02.6383853Z,[email protected],Customer Vehicle Conditions report for 20221004,Originating,,Normal
2022-10-06T23:30:12.3284393Z,[email protected],Auto Reply: Where are you !!!!,Originating,,Normal
'@

$keyWord = 'Auto Reply:*'

$csv | Group-Object sender_address | ForEach-Object {

    # From current group, get all date/times where the message_subject matches the keyword
    $dates = $_.Group.Where{ $_.message_subject -like $keyWord }.ForEach{ $_.origin_timestamp_utc }

    # Remove the time part and get unique dates
    $uniqueDates = $dates -replace 'T. ' | Sort-Object -Unique

    # Output
    [PSCustomObject]@{
        Sender_Address = $_.Name
        Frequency = $dates.Count
        'Date (Ascending sorted)' = $uniqueDates -join ', '
    }
}

Output:

Sender_Address        Frequency Date (Ascending sorted)
--------------        --------- -----------------------
[email protected]         3 2022-09-27, 2022-09-28 
[email protected]         2 2022-09-30, 2022-10-04 
[email protected]         1 2022-10-06

Compared to the expected output, there is a slight difference in the Date column, 2nd row. I have interpreted the "ascending sorted" to be applied to each row individually. I'm not sure if OP actually meant to sort the whole column (possibly by newest date).

CodePudding user response:

This solves your problem, including ascending sorting for the dates:

# gets all messages starting with "Auto Reply"
$messages = Import-Csv <your file> | Where-Object message_subject -match "Auto Reply.*"  
# changes timestamp format to date only
$messages | ForEach-Object { $_.origin_timestamp_utc = [datetime]::Parse($_.origin_timestamp_utc).Date.ToString('yyyy-MM-dd') } 
#groups by sender_adress
$messages | Group-Object sender_address | ForEach-Object {
    $sortedDates = ""
    # sorts dates so the date with the least amount of messages comes first (you can change this with Sort-Object -descending)
    $_.group | Group-Object origin_timestamp_utc | Sort-Object count | ForEach-Object {
        $sortedDates  = $_.Name   ","
    }
    [pscustomobject]@{
      Sender_Address = $_.Name
      Frequency = $_.Count
      "Date (Ascending Sorted)" = $sortedDates.TrimEnd(",") #remove last ","
    }
} 

Add Export-Csv to export it

  • Related