Home > OS >  how to find unique line in a txt file?
how to find unique line in a txt file?

Time:12-10

I have a LARGE list of hashes. I need to find out which ones only appear once as most are duplicates.

EX: the last line 238db2..... only appears once


ac6b51055fdac5b92934699d5b07db78
ac6b51055fdac5b92934699d5b07db78
7f5417a85a63967d8bba72496faa997a
7f5417a85a63967d8bba72496faa997a
1e78ba685a4919b7cf60a5c60b22ebc2
1e78ba685a4919b7cf60a5c60b22ebc2
238db202693284f7e8838959ba3c80e8

I tried the following that just listed one of each of the doubles, not just identifying the one that only appeared once

foreach ($line in (Get-Content "C:\hashes.txt" | Select-Object -Unique)) {
  Write-Host "Line '$line' appears $(($line | Where-Object {$_ -eq $line}).count) time(s)."
}

CodePudding user response:

$values = Get-Content .\hashes.txt # Read the values from the hashes.txt file

$groups = $values | Group-Object | Where-Object { $_.Count -eq 1 } # Group the values by their distinct values and filter for groups with a single value

foreach ($group in $groups) {
    foreach ($value in $group.Values) {
        Write-Host "$value" # Output the value of each group
    }
}

CodePudding user response:

  • Given that you're dealing with a large file, Get-Content is best avoided.

  • A switch statement with the -File parameter allows efficient line-by-line processing, and given that duplicates appear to be grouped together already, they can be detected by keeping a running count of identical lines.

$count = 0 # keeps track of the count of identical lines occurring in sequence
switch -File 'C:\hashes.txt' {
  default {
    if ($prevLine -eq $_ -or $count -eq 0) { # duplicate or first line.
      if ($count -eq 0) { $prevLine = $_ }
        $count 
    }
    else { # current line differs from the previous one.
      if ($count -eq 1) { $prevLine } # non-duplicate -> output
      $prevLine = $_
      $count = 1
    }
  }
}
if ($count -eq 1) { $prevLine } # output the last line, if a non-duplicate.
  • Related