Home > database >  Powershell script to check if a file contains duplicates
Powershell script to check if a file contains duplicates

Time:10-02

Could you once again help my ignorant soul with this one, please?

Hey there sweet faces! I have a script that looks into a $sourceFile that like this one:

Some other text here*******
------------------------------------------------
 F I N A L  C O U N T                    9 , 9 9

**********
** [0000789000]
ID Number:0000789000
Complete!
******************!
************

for a "Total count" number and the corresponding "ID number" after it.

$blocks = ((Get-Content -Path $sourceFile -Raw) -split '-{2,}').Trim() | 
            Where-Object { $_ -match '(?sm)^\s?F I N A L  C O U N T' }

if (!$blocks) {
    Write-Host "No such counts were found."
}

else {
    $blocks | ForEach-Object {
    $id = $_ -replace '(?sm).ID Number:(\d ).*', '$1'
    }    
}

What I am trying to do next but miserably failing so far is to check if there are $id's that are the same number (duplicates). For example, see if there is an "ID number: 123456" on another line in my $sourceFile and if so, do things with it... Any one who can point me to the right direction is more than appreciated.

*Hex:

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   53 6F 6D 65 20 6F 74 68 65 72 20 74 65 78 74 20  Some other text 
00000010   68 65 72 65 2A 2A 2A 2A 2A 2A 2A 0D 0A 2D 2D 2D  here*******..---
00000020   2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D  ----------------
00000030   2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D  ----------------
00000040   2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 0D 0A 20  -------------.. 
00000050   46 20 49 20 4E 20 41 20 4C 20 20 43 20 4F 20 55  F I N A L  C O U
00000060   20 4E 20 54 20 20 20 20 20 20 20 20 20 20 20 20   N T            
00000070   20 20 20 20 20 20 20 20 39 20 2C 20 39 20 39 0D          9 , 9 9.
00000080   0A 0D 0A 2A 2A 2A 2A 2A 2A 2A 2A 2A 2A 0D 0A 2A  ...**********..*
00000090   2A 20 5B 30 30 30 30 37 38 39 30 30 30 5D 0D 0A  * [0000789000]..
000000A0   49 44 20 4E 75 6D 62 65 72 3A 30 30 30 30 37 38  ID Number:000078
000000B0   39 30 30 30 0D 0A 43 6F 6D 70 6C 65 74 65 21 0D  9000..Complete!.
000000C0   0A 2A 2A 2A 2A 2A 2A 2A 2A 2A 2A 2A 2A 2A 2A 2A  .***************
000000D0   2A 2A 2A 21 0D 0A 2A 2A 2A 2A 2A 2A 2A 2A 2A 2A  ***!..**********
000000E0   2A 2A                                            **   

LOVE you all as always!!!

Sv3n

CodePudding user response:

You can collect all IDs in an array and use Group-Object to find duplicates (multiples):

$ids = $blocks -replace '(?sm).*^ID Number:(\d ).*', '$1'

$duplicateIds = ($ids | Group-Object | Where-Object Count -gt 1).Name
  • Group-Object outputs [Microsoft.PowerShell.Commands.GroupInfo] instances that each describe a group:
    • .Count reports the number of elements in the group.

    • .Name reports the grouping property/ies value(s) as a single string.

      • In the case at hand, where the grouping property is each input object itself, i.e. a string to begin with, .Name can therefore be used to return each ID that has duplicates, but note that if you were to group either non-string values or by multiple properties, .Name wouldn't reflect the actual grouping values.
    • .Group contains the collection of elements comprising the group.

  • Related