I have a TXT-File with thousands of lines. The number after the first Slash is the image ID. I want to delete all lines so that only one line remains for every ID. Which of the lines is getting killed doesn't matter.
I tried to pipe the TXT to a CSV with Powershell and work with the unique parameter. But it didnt work. Any ideas how I can iterate through the TXT and kill all lines, so that always only one line per unique ID remains? :/
Status Today
thumbnails/4000896042746/2021-08-17_4000896042746_small.jpg
thumbnails/4000896042746/2021-08-17_4000896042746_smallX.jpg
thumbnails/4000896042333/2021-08-17_4000896042746_medium.jpg
thumbnails/4000896042444/2021-08-17_4000896042746_hugex.jpg
thumbnails/4000896042333/2021-08-17_4000896042746_tiny.jpg
After the script
thumbnails/4000896042746/2021-08-17_4000896042746_small.jpg
thumbnails/4000896042333/2021-08-17_4000896042746_medium.jpg
thumbnails/4000896042444/2021-08-17_4000896042746_hugex.jpg
CodePudding user response:
If it concerns "TXT-File with thousands of lines", I would use the PowerShell pipeline for this because (if correctly setup) it will perform the same but uses far less memory.
Performance improvements might actually be leveraged from using a HashTable (or a HashSet) which is based on a binary search (and therefore much faster then e.g. grouping).
(I am pleading to get an accelerated HashSet #16003
into PowerShell)
$Unique = [System.Collections.Generic.HashSet[string]]::new()
Get-Content .\InFile.txt |ForEach-Object {
if ($Unique.Add(($_.Split('/'))[-2])) { $_ }
} | Set-Content .\OutFile.txt
CodePudding user response:
You can group by custom property. So if you know what's your ID then you just have to group by that and then take the first element from the group:
$content = Get-Content "path_to_your_file";
$content = ($content | group { ($_ -split "/")[1] } | % { $_.Group[0] });
$content | Out-File "path_to_your_result_file"
CodePudding user response:
Here a solution that uses a calculated property to create an object that contains the ID and the FileName. Then I group the result based on the ID, iterate over each group and select the first FileName:
$yourFileList = @(
'thumbnails/4000896042746/2021-08-17_4000896042746_small.jpg',
'thumbnails/4000896042746/2021-08-17_4000896042746_smallX.jpg',
'thumbnails/4000896042333/2021-08-17_4000896042746_medium.jpg',
'thumbnails/4000896042444/2021-08-17_4000896042746_hugex.jpg',
'thumbnails/4000896042333/2021-08-17_4000896042746_tiny.jpg'
)
$yourFileList |
Select-Object @{Name = "Id"; Expression = { ($_ -split '/')[1] } }, @{ Name = 'FileName'; Expression = { $_ } } |
Group Id |
ForEach-Object { $_.Group[0].FileName }