I have a text file with 105,779 lines containing duplicates. I need to remove the duplicates without changing the order. I am able to remove the duplicates, but it is changing the order of the text file. Any help would be appreciated.
CodePudding user response:
The Select-Object
cmdlet has a -Unique
switch that does just what you want:
Get-Content allMethods.txt | Select-Object -Unique > uniqueMethods.txt
Note (written as of PowerShell 7.2.3):
While
Select-Object -Unique
is surprisingly slow, it still outperforms your own manual solution.- The inefficient implementation is the subject of GitHub issue #11221.
Unlike PowerShell in general,
-Unique
is surprisingly and invariably case-sensitive:'a', 'a', 'b' | Select-Object -Unique # -> 'a', 'b' 'a', 'A', 'b' | Select-Object -Unique # !! -> 'a', 'A', 'b'
- See GitHub issue #12059 for a discussion of this unexpected behavior. (The expected behavior would be to case-insensitive by default and offer a case-sensitive opt-in).
CodePudding user response:
I was able to find the answer
$hash = @{}
gc allMethods.txt | %{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} > uniqueMethods.txt