Home > Software design >  How to remove duplicates lines from text file while keeping order of file
How to remove duplicates lines from text file while keeping order of file

Time:05-06

I have a text file with 105,779 lines containing duplicates. I need to remove the duplicates without changing the order. I am able to remove the duplicates, but it is changing the order of the text file. Any help would be appreciated.

CodePudding user response:

The Select-Object cmdlet has a -Unique switch that does just what you want:

Get-Content allMethods.txt | Select-Object -Unique > uniqueMethods.txt

Note (written as of PowerShell 7.2.3):

  • While Select-Object -Unique is surprisingly slow, it still outperforms your own manual solution.

  • Unlike PowerShell in general, -Unique is surprisingly and invariably case-sensitive:

    'a', 'a', 'b' | Select-Object -Unique # -> 'a', 'b'
    
    'a', 'A', 'b' | Select-Object -Unique # !! -> 'a', 'A', 'b'
    
    • See GitHub issue #12059 for a discussion of this unexpected behavior. (The expected behavior would be to case-insensitive by default and offer a case-sensitive opt-in).

CodePudding user response:

I was able to find the answer

$hash = @{}

gc allMethods.txt | %{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} > uniqueMethods.txt
  • Related