There is an unsorted text file of about 100 million short lines:
Lucy
Mary
Mary
Mary
John
John
John
Lucy
Mark
Mary
I need to get
Mary
Mary
Mary
John
John
Lucy
I cannot get the lines ordered according to how many times each line is repeated in the text, i.e. the most frequently occurring lines must be listed first.
CodePudding user response:
$List = 'Lucy', 'Mary', 'Mary', 'Mary', 'John', 'John', 'John', 'Lucy', 'Mark', 'Mary'
$Count = @{}
foreach ($Item in $List) { $Count[$Item] }
$Count.GetEnumerator() |Sort-Object -Descending 'Value' |
ForEach-Object { ,$_.Name * ($_.Value - 1) }
Mary
Mary
Mary
John
John
Lucy
Explanation
$Count = @{}
Create a new hashtableforeach ($Item in $List) { $Count[$Item] }
Count the repeating instances- starting from nothing (
$Null
1
=>1
)
- starting from nothing (
$Count.GetEnumerator() |Sort-Object -Descending 'Value'
Sorts (descending) the hashtable based on the valuesForEach-Object { ,$_.Name * ($_.Value - 1) }
Iterate to the found instances,$_.Name
forces the string to an array... * ($_.Value - 1)
repeat the array 1 less times
CodePudding user response:
You could also use Group-Object
to group equal lines together like below:
Get-Content -Path 'D:\Test\unsorted.txt' | Group-Object | ForEach-Object {
if ($_.Count -gt 1) { $_.Group | Select-Object -Skip 1 }
else { $_.Group }
} | Sort-Object -Descending
Result:
Mary
Mary
Mary
Mark
Lucy
John
John