I'm pretty sure the answer is no, but it keeps bugging me.
I have been tasked with finding duplicate files in certain location, recursively. I can do that with no problem. But seeing as some of the files have 3 or 4 duplicates I cannot answer the question of "How many files are originals?" without resorting to excel editing.
Code:
gci -path $path -recurse -file -erroraction silentlycontinue|
Select @{l='Original Filename';e={$_.PSChildName}}, @{l='Compare Filename';e={$_.BaseName.replace('_','*').replace(' ','*').replace('-','*')}}, @{l="Path";e={$_.PSParentPath.Substring(38,$_.PSParentPath.Length-38)}}, @{l="Link";e={$_.FullName}}|
group -Property 'Compare Filename'|
Where {$_.count -ge 2}|
%{$_.group}|
Export-Csv -Path $path2 -NoTypeInformation
Path variables are irrelevant, so I will not be listing them.
EDIT: I have tested both of the provided resolutions, as well as read the marvelous explanation provided by mklement0. In the end, at least with ~4k files I am working with, the speed of both resolutions is comparable. See below for the 'measure-command' output.
CodePudding user response:
To reliably count the number of groups (Microsoft.PowerShell.Commands.GroupInfo
instances) that Group-Object
outputs, use either of the following:
- Pipeline-based, as suggested by zett42; while comparatively slow, this results in streaming processing that doesn't require collecting all
Group-Object
output in memory first:
(1, 1, 1 | Group-Object | Measure-Object).Count # -> 1 (group)
- Concise, expression-based, as suggested by Lee Dailey; note that this involves collecting all output objects in memory first:
@(1, 1, 1 | Group-Object).Count # -> 1 (group)
# Alternative, using .Length
(1, 1, 1 | Group-Object).Length # -> 1 (group)
Note:
To count all original (non-duplicate) objects, i.e. those that are in a group of their own, simply append
| Where-Object Count -eq 1
toGroup-Object
above.The use of
@()
, the array-subexpression operator, is crucial in this case: It ensures that theGroup-Object
output is considered an array even if only a single group happens to be output.- This ensures that it is the array's
.Count
property that is queried rather than a singleGroupInfo
instance's own.Count
property - which reflects the count of members of the group, and would be3
in the example above (try(1, 1, 1 | Group-Object).Count
).
- This ensures that it is the array's
Alternatively, using
.Length
instead of.Count
bypasses this naming conflict:.Length
and.Count
are aliases of each other and are both provided as intrinsic properties even on scalars (single objects), as part of the unified handling of scalars and collections in PowerShell: That is, PowerShell presents even any single object with.Length
/.Count
properties that indicate the count of that object, which by definition is1
- unless preempted by a type-native property of the same name.The intrinsic
.Length
property therefore works as expected, given thatGroupInfo
has no.Length
property.The inverse scenario can be demonstrated with a string scalar:
'foo'.Length
is3
- the value of the type-native.Length
property reflecting the character count - whereas'foo'.Count
is1
- the intrinsic.Count
property that "counts" the single object.
In the pipeline solution with
Measure-Object
the problem doesn't arise due to the pipeline's enumeration behavior: however many objectsGroup-Object
outputs are sent one by one through the pipeline, andMeasure-Object
counts them - and in this case the value of the type-native.Count
property of the always singleMicrosoft.PowerShell.Commands.GenericMeasureInfo
instance thatMeasure-Object
outputs is the value of interest.