I have a LARGE list of hashes. I need to find out which ones only appear once as most are duplicates.
EX: the last line 238db2.....
only appears once
ac6b51055fdac5b92934699d5b07db78
ac6b51055fdac5b92934699d5b07db78
7f5417a85a63967d8bba72496faa997a
7f5417a85a63967d8bba72496faa997a
1e78ba685a4919b7cf60a5c60b22ebc2
1e78ba685a4919b7cf60a5c60b22ebc2
238db202693284f7e8838959ba3c80e8
I tried the following that just listed one of each of the doubles, not just identifying the one that only appeared once
foreach ($line in (Get-Content "C:\hashes.txt" | Select-Object -Unique)) {
Write-Host "Line '$line' appears $(($line | Where-Object {$_ -eq $line}).count) time(s)."
}
CodePudding user response:
$values = Get-Content .\hashes.txt # Read the values from the hashes.txt file
$groups = $values | Group-Object | Where-Object { $_.Count -eq 1 } # Group the values by their distinct values and filter for groups with a single value
foreach ($group in $groups) {
foreach ($value in $group.Values) {
Write-Host "$value" # Output the value of each group
}
}
CodePudding user response:
Given that you're dealing with a large file,
Get-Content
is best avoided.A
switch
statement with the-File
parameter allows efficient line-by-line processing, and given that duplicates appear to be grouped together already, they can be detected by keeping a running count of identical lines.
$count = 0 # keeps track of the count of identical lines occurring in sequence
switch -File 'C:\hashes.txt' {
default {
if ($prevLine -eq $_ -or $count -eq 0) { # duplicate or first line.
if ($count -eq 0) { $prevLine = $_ }
$count
}
else { # current line differs from the previous one.
if ($count -eq 1) { $prevLine } # non-duplicate -> output
$prevLine = $_
$count = 1
}
}
}
if ($count -eq 1) { $prevLine } # output the last line, if a non-duplicate.