Home > Back-end >  How to sort multilevel list
How to sort multilevel list

Time:02-24

I need to sort a list with multilevel indexes

(@('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10") | Sort-Object) -join ", "

1, 1.1.1, 1.99, 10, 2, 2.5, 3, 5.5

I came up with such a solution, but it does not work with indexes above ten

(@('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10") | Sort-Object {
    $chanks = $_.Split('.')
    $sum = 0
    for ($i = 0; $i -lt $chanks.Count; $i  ) {
        $sum  = [int]$chanks[$i] / [math]::Pow(10, $i)
    }
    $sum
}) -join ", "

1, 1.1.1, 2, 2.5, 3, 5.5, 10, 1.99

CodePudding user response:

Based on Lance U. Matthews comment to your question ... | Sort-Object { [int] $_.Split('.')[0] }, { [int] $_.Split('.')[1] }, { [int] $_.Split('.')[2] },

here is a way to do the same thing while not knowing the number of nested levels.

It is done by checking the max nesting level and storing it into a variable, then building dynamically an array of scriptblock to sort against.

$Arr = @('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10", '12.9.3.1.5', '176', '12.9.9', '2.1') 
$MaxDots = ($Arr | % { $_.Split('.').count } | Measure-Object  -Maximum).Maximum

$sbl = for ($i = 0; $i -lt $MaxDots; $i  ) {
  { [int] $_.Split('.')[$i] }.GetNewClosure()
}

$Arr | Sort-Object $sbl

CodePudding user response:

Here's a couple of solutions with some caveats.

The -Property parameter of Sort-Object accepts an array, so you can specify "sort by...then by...". If you know the maximum number of "sub-indices" is 2 (i.e. x.y.z) then you can break the string apart into components separated by . and then sort successively by each component as an integer. It's repetitive, but it works...

(
    @('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10") |
        Sort-Object -Property {
            # Sort by the first component
            return [Int32] $_.Split('.')[0]
        }, {
            # Sort by the second component
            return [Int32] $_.Split('.')[1]
        }, {
            # Sort by the third component
            return [Int32] $_.Split('.')[2]
        }
) -join ', '

If a component is not specified (e.g. '1.2'.Split('.')[2]) then, helpfully, it becomes 0 when casting to an [Int32].

Here's an alternative that only uses one [ScriptBlock] but it also requires that the maximum length of a sub-index in digits is known, too...

$maxComponentCount = 3
$maxComponentDigits = 2

# Create a string of repeated '0's $maxComponentDigits long
$emptyComponentText = [String]::new([Char] '0', $maxComponentDigits)

(
    @('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10") |
        Sort-Object -Property {
            $components = [String[]] (
                $_.Split('.').ForEach(
                    # Pad each component to $maxComponentDigits digits
                    { $_.PadLeft($maxComponentDigits, '0') }
                )
            );
            
            if ($components.Length -lt $maxComponentCount)
            {
                # Pad $components up to $maxComponentCount with $emptyComponentText elements
                $components  = ,$emptyComponentText * ($maxComponentCount - $components.Length)
            }

            # Join components - now $maxComponentCount elements of $maxComponentDigits digits - back into an index string
            return $components -join '.'
        }
) -join ', '

That is padding each input index to have the same number of sub-indices and each sub-index to have the same number of digits and then relying on lexical sorting to put them in the correct order, so this...

@('1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10")

...gets sorted as if it looked like this...

@('01.00.00', '02.00.00', '03.00.00', '01.01.01', '01.99.00', '02.05.00', '05.05.00', "10.00.00")

I suppose you could set both $maxComponentCount and $maxComponentDigits to, say, 100 if neither value is known, but that feels like a clunky workaround (with performance implications, too). I'll try to think of something better.

CodePudding user response:

'1', '2', '3', '1.1.1', '1.99', '2.5', '5.5', "10" |Sort-Object { '{0:000}{1:000}{2:000}{3:000}' -f $([int[]]("$_.0.0.0".Split('.'))) }

CodePudding user response:

The problem we're bumping into is that each [ScriptBlock] passed to Sort-Object only receives a single input value at a time and returns a value on which to sort, but with each index having a variable number of sub-indices we can't predict to how many levels we need to compare when there's values we haven't yet seen. What we need is a way to define how two values should be sorted relative to each other.

Fortunately, .NET, upon which (Windows) PowerShell is based, defines the [IComparer[]] interface that is used by various methods to determine the sort order of values. Its lone method, Compare(), is passed two values and returns whether they are equal or, otherwise, which one comes before the other. All we have to do is provide our own (simple) implementation of [IComparer[]] and then we can pass it to a built-in sorting method.

I'll start with the code and results and explain how it works in the sections that follow.

PowerShell script using .NET Comparer with test collection

Add-Type -Language 'CSharp' -TypeDefinition (
@'
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;

namespace SO71232189
{
    // Author: Lance U. Matthews
    // Source: https://stackoverflow.com/a/71237565/150605
    public class HierarchicalIndexComparer : Comparer<string>
    {
        // https://docs.microsoft.com/dotnet/api/system.collections.generic.comparer-1.compare
        // When x < y, returns negative integer
        // When x == y, returns 0
        // When x > y, returns positive integer
        public override int Compare(string x, string y)
        {
            // Split each index into components converted to integers with validation
            int[] xValues = GetComponentValues(x, "x"); //NOTE: Use nameof(x) in C# 6 and above
            int[] yValues = GetComponentValues(y, "y"); //NOTE: Use nameof(y) in C# 6 and above
            int componentsToCompare = Math.Min(xValues.Length, yValues.Length);

            for (int index = 0; index < componentsToCompare; index  )
            {
                int componentCompareResult = xValues[index].CompareTo(yValues[index]);

                // Sort x and y by the current component values if they are not equal
                if (componentCompareResult != 0)
                    return componentCompareResult;

                // Otherwise, continue with the next component
            }

            // The first componentsToCompare elements of x and y are equal
            // Sort x and y by their component count
            return xValues.Length.CompareTo(yValues.Length);
        }

        private static int[] GetComponentValues(string index, string parameterName)
        {
            return index.Split('.')
                .Select(
                    component => {
                        int value;

                        // Leading zeroes will be removed and not considered when comparing components
                        if (!int.TryParse(component, NumberStyles.None, CultureInfo.InvariantCulture, out value))
                        {
                            throw new ArgumentOutOfRangeException(
                                parameterName, index, string.Format("Component \"{0}\" is empty or contains non-digit characters.", component)
                            );
                        }

                        return value;
                    }
                ).ToArray();
        }
    }
}
'@
)

[String[]] $initial = 1, 10, 100 |
    ForEach-Object -Process { "$_", "$_.0", "$_.0.0", "$_.0.0.0", "$_.0.1", "$_.1", "$_.1.0", "$_.1.1" }
# Shuffle the elements into a "random" order that is the same between runs
[String[]] $shuffled = Get-Random -Count $initial.Length -InputObject $initial -SetSeed 12345
[String[]] $sorted = [System.Linq.Enumerable]::OrderBy(
    $shuffled,                                                    # The values to be ordered
    [Func[String, String]] {                                      # Selects the key by which to order each value
        param($value)

        return $value                                             # Return the value as its own key
    },
    (New-Object -TypeName 'SO71232189.HierarchicalIndexComparer') # The comparer to perform the ordering
) | ForEach-Object -Process {
    # Just for demonstration purposes to show that further pipeline elements can be used after sorting
    return $_
}

for ($index = 0; $index -lt $initial.Length; $index  )
{
    [PSCustomObject] @{
        '#'         = $index
        '$initial'  = $initial[$index]
        '$shuffled' = $shuffled[$index]
        '$sorted'   = $sorted[$index]
    }
}

The advantage of using [Enumerable]::OrderBy() with a custom [IComparer[]] is that it will provide proper sorting with one pass through your indices, plus you can pipe the output into subsequent cmdlets.

You can also use the Array::Sort() static method, although that sorts the array passed to it in-place and doesn't return anything for a pipeline to process. Here I'll create a copy of the array first so it can be sorted separately...

# Create a copy of the $shuffled array named $sorted
$sorted = New-Object -TypeName 'System.String[]' -ArgumentList $shuffled.Length
[Array]::Copy($shuffled, $sorted, $shuffled.Length)

[Array]::Sort(
    $sorted,                                                      # The array to be sorted
    (New-Object -TypeName 'SO71232189.HierarchicalIndexComparer') # The comparer with which to sort it
)

.NET Comparer test collection output

The above script produces three arrays showing the different transformations of the index collection...

# $initial $shuffled $sorted
0 1 100.1 1
1 1.0 10.0.0.0 1.0
2 1.0.0 100.0.1 1.0.0
3 1.0.0.0 1.0.0.0 1.0.0.0
4 1.0.1 100.0 1.0.1
5 1.1 10.0.0 1.1
6 1.1.0 1.1 1.1.0
7 1.1.1 100.1.1 1.1.1
8 10 1.0 10
9 10.0 100.0.0 10.0
10 10.0.0 1 10.0.0
11 10.0.0.0 100.1.0 10.0.0.0
12 10.0.1 10.1.0 10.0.1
13 10.1 10.0 10.1
14 10.1.0 10.1.1 10.1.0
15 10.1.1 100.0.0.0 10.1.1
16 100 10.1 100
17 100.0 1.1.1 100.0
18 100.0.0 1.1.0 100.0.0
19 100.0.0.0 10 100.0.0.0
20 100.0.1 1.0.1 100.0.1
21 100.1 10.0.1 100.1
22 100.1.0 1.0.0 100.1.0
23 100.1.1 100 100.1.1

[HierarchicalIndexComparer] implementation

Here, written in C#, is an implementation of the [IComparer[]] interface for sorting indices; it derives from the recommended [Comparer[]] class...

// Author: Lance U. Matthews
// Source: https://stackoverflow.com/a/71237565/150605
public class HierarchicalIndexComparer : Comparer<string>
{
    // https://docs.microsoft.com/dotnet/api/system.collections.generic.comparer-1.compare
    // When x < y, returns negative integer
    // When x == y, returns 0
    // When x > y, returns positive integer
    public override int Compare(string x, string y)
    {
        // Split each index into components converted to integers with validation
        int[] xValues = GetComponentValues(x, "x"); //NOTE: Use nameof(x) in C# 6 and above
        int[] yValues = GetComponentValues(y, "y"); //NOTE: Use nameof(y) in C# 6 and above
        int componentsToCompare = Math.Min(xValues.Length, yValues.Length);

        for (int index = 0; index < componentsToCompare; index  )
        {
            int componentCompareResult = xValues[index].CompareTo(yValues[index]);

            // Sort x and y by the current component values if they are not equal
            if (componentCompareResult != 0)
                return componentCompareResult;

            // Otherwise, continue with the next component
        }

        // The first componentsToCompare elements of x and y are equal
        // Sort x and y by their component count
        return xValues.Length.CompareTo(yValues.Length);
    }

    private static int[] GetComponentValues(string index, string parameterName)
    {
        return index.Split('.')
            .Select(
                component => {
                    int value;

                    // Leading zeroes will be removed and not considered when comparing components
                    if (!int.TryParse(component, NumberStyles.None, CultureInfo.InvariantCulture, out value))
                    {
                        throw new ArgumentOutOfRangeException(
                            parameterName, index, string.Format("Component \"{0}\" is empty or contains non-digit characters.", component)
                        );
                    }

                    return value;
                }
            ).ToArray();
    }
}

Comparer logic

In short, that is splitting each index (x and y) into integer components, comparing as many components as they have in common, sorting based on any components that are unequal, and if all common components are equal then sorting the shorter index first.

So, for example, when comparing "1.2.3" to "1.2" it breaks them up into "1", "2", "3" and "1", "2", respectively, and then performs the following comparisons: | Component | x | y | Comparison result | |:---------:|:-----:|:-----:|:------------------| | 0 | "1" | "1" | Equal | | 1 | "2" | "2" | Equal | Since all 2 components of y equal the first 2 components of x, it now falls to the index with less components sorting first; x has 3 components whereas y has 2, therefore x sorts after y, which is correct: "1.2.3" > "1.2".

When comparing "1.2.3" to "1.20.4" it breaks them up into "1", "2", "3" and "1", "20", "4", respectively, and then performs the following comparisons: | Component | x | y | Comparison result | |:---------:|:-----:|:------:|:------------------| | 0 | "1" | "1" | Equal | | 1 | "2" | "20" | x component is less than y component | | 2 | "3" | "4" | Not evaluated | Since when comparing the first 3 components of x and y we find that component 1 of each is unequal, we return the result of that comparison; x[1] < y[1] therefore x < y, which is correct: "1.2.3 < 1.20.4".

Building the [HierarchicalIndexComparer] code

To build the [HierarchicalIndexComparer] code so we can use it from PowerShell we pass the previous C# code to the -TypeDefinition parameter of the Add-Type cmdlet...

Add-Type -Language 'CSharp' -TypeDefinition (
@'
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;

namespace SO71232189
{
    // Author: Lance U. Matthews
    // Source: https://stackoverflow.com/a/71237565/150605
    public class HierarchicalIndexComparer : Comparer<string>
    {
        // Class implementation removed for brevity
    }
}
'@
)

The type can now be referenced as...

[SO71232189.HierarchicalIndexComparer]

...and instantiated with...

New-Object -TypeName 'SO71232189.HierarchicalIndexComparer'

...or...

[SO71232189.HierarchicalIndexComparer]::new()

Using the [HierarchicalIndexComparer]

Now that we have a [HierarchicalIndexComparer] type we can pass an instance to a sorting method. LINQ is a .NET technology that allows you to perform operations on sequences in much the same way as the Group-Object, Select-Object, and Where-Object cmdlets do with PowerShell pipelines. LINQ provides two methods, OrderBy() and OrderByDescending(), to perform primary sorting of sequences. We just need to pass our indices and our comparer to one of these methods to get sorted output...

[String[]] $sorted = [System.Linq.Enumerable]::OrderBy(
    $indices,                                                     # The values to be ordered
    [Func[String, String]] {                                      # Selects the key by which to order each value
        param($value)

        return $value                                             # Return the value as its own key
    },
    (New-Object -TypeName 'SO71232189.HierarchicalIndexComparer') # The comparer to perform the ordering
)

Briefly, the three parameters being passed to OrderBy() are...

  1. The sequence of indices to be sorted
  2. The "key" on which each index is sorted (equivalent to ... | Sort-Object -Property { $_ })
  3. An instance of our custom index comparer
  • Related