Home > Back-end >  Is there a way to get number of groups created by 'Group-Object' cmdlet?
Is there a way to get number of groups created by 'Group-Object' cmdlet?

Time:12-23

I'm pretty sure the answer is no, but it keeps bugging me.

I have been tasked with finding duplicate files in certain location, recursively. I can do that with no problem. But seeing as some of the files have 3 or 4 duplicates I cannot answer the question of "How many files are originals?" without resorting to excel editing.

Code:

gci -path $path -recurse -file -erroraction silentlycontinue|
Select @{l='Original Filename';e={$_.PSChildName}}, @{l='Compare Filename';e={$_.BaseName.replace('_','*').replace(' ','*').replace('-','*')}}, @{l="Path";e={$_.PSParentPath.Substring(38,$_.PSParentPath.Length-38)}}, @{l="Link";e={$_.FullName}}|
group -Property 'Compare Filename'|
Where {$_.count -ge 2}|
%{$_.group}|
Export-Csv -Path $path2 -NoTypeInformation

Path variables are irrelevant, so I will not be listing them.

EDIT: I have tested both of the provided resolutions, as well as read the marvelous explanation provided by mklement0. In the end, at least with ~4k files I am working with, the speed of both resolutions is comparable. See below for the 'measure-command' output.

Expression-based enter image description here

Pipeline-based enter image description here

CodePudding user response:

To reliably count the number of groups (Microsoft.PowerShell.Commands.GroupInfo instances) that Group-Object outputs, use either of the following:

  • Pipeline-based, as suggested by zett42; while comparatively slow, this results in streaming processing that doesn't require collecting all Group-Object output in memory first:
(1, 1, 1 | Group-Object | Measure-Object).Count  # -> 1 (group)
  • Concise, expression-based, as suggested by Lee Dailey; note that this involves collecting all output objects in memory first:
@(1, 1, 1 | Group-Object).Count   # -> 1 (group)

# Alternative, using .Length
(1, 1, 1 | Group-Object).Length   # -> 1 (group)

Note:

  • To count all original (non-duplicate) objects, i.e. those that are in a group of their own, simply append | Where-Object Count -eq 1 to Group-Object above.

  • The use of @(), the array-subexpression operator, is crucial in this case: It ensures that the Group-Object output is considered an array even if only a single group happens to be output.

    • This ensures that it is the array's .Count property that is queried rather than a single GroupInfo instance's own .Count property - which reflects the count of members of the group, and would be 3 in the example above (try (1, 1, 1 | Group-Object).Count).
  • Alternatively, using .Length instead of .Count bypasses this naming conflict: .Length and .Count are aliases of each other and are both provided as intrinsic properties even on scalars (single objects), as part of the unified handling of scalars and collections in PowerShell: That is, PowerShell presents even any single object with .Length / .Count properties that indicate the count of that object, which by definition is 1 - unless preempted by a type-native property of the same name.

    • The intrinsic .Length property therefore works as expected, given that GroupInfo has no .Length property.

    • The inverse scenario can be demonstrated with a string scalar: 'foo'.Length is 3 - the value of the type-native .Length property reflecting the character count - whereas 'foo'.Count is 1 - the intrinsic .Count property that "counts" the single object.

  • In the pipeline solution with Measure-Object the problem doesn't arise due to the pipeline's enumeration behavior: however many objects Group-Object outputs are sent one by one through the pipeline, and Measure-Object counts them - and in this case the value of the type-native .Count property of the always single Microsoft.PowerShell.Commands.GenericMeasureInfo instance that Measure-Object outputs is the value of interest.

  • Related