Home > Mobile >  How to remove 1 instance of each (identical) line in a text file in Windows PowerShell and group the
How to remove 1 instance of each (identical) line in a text file in Windows PowerShell and group the

Time:01-04

There is an unsorted text file of about 100 million short lines:

Lucy 
Mary 
Mary 
Mary 
John 
John 
John 
Lucy 
Mark
Mary

I need to get

Mary 
Mary 
Mary 
John 
John 
Lucy

I cannot get the lines ordered according to how many times each line is repeated in the text, i.e. the most frequently occurring lines must be listed first.

CodePudding user response:

$List = 'Lucy', 'Mary', 'Mary', 'Mary', 'John', 'John', 'John', 'Lucy', 'Mark', 'Mary'
$Count = @{}
foreach ($Item in $List) { $Count[$Item]   }
$Count.GetEnumerator() |Sort-Object -Descending 'Value' |
    ForEach-Object { ,$_.Name * ($_.Value - 1) }
Mary
Mary
Mary
John
John
Lucy

Explanation

  • $Count = @{}
    Create a new hashtable
  • foreach ($Item in $List) { $Count[$Item] }
    Count the repeating instances
    • starting from nothing ($Null 1 => 1)
  • $Count.GetEnumerator() |Sort-Object -Descending 'Value'
    Sorts (descending) the hashtable based on the values
  • ForEach-Object { ,$_.Name * ($_.Value - 1) }
    Iterate to the found instances
    • ,$_.Name forces the string to an array
    • ... * ($_.Value - 1) repeat the array 1 less times

CodePudding user response:

You could also use Group-Object to group equal lines together like below:

Get-Content -Path 'D:\Test\unsorted.txt' | Group-Object | ForEach-Object {
    if ($_.Count -gt 1) { $_.Group | Select-Object -Skip 1 }
    else { $_.Group }
} | Sort-Object -Descending

Result:

Mary 
Mary 
Mary
Mark
Lucy 
John 
John 
  • Related