Home > Enterprise >  Sorting 2D array by 2 columns
Sorting 2D array by 2 columns

Time:04-29

I am looking for an efficient way to sort the data in a 2D array. The array can have many rows and columns but, in this example, I will just limit it to 6 rows and 5 columns. The data is strings as some are words. I only include one word below but in the real data there are a few columns of words. I realise if we sort, we should treat the data as numbers?

string[,] WeatherDataArray = new string[6,5];

The data is a set of weather data that is read every day and logged. This data goes through many parts of their system which I cannot change and it arrives to me in a way that it needs sorting. An example layout could be:

Day number, temperature, rainfall, wind, cloud

The matrix of data could look like this

3,20,0,12,cumulus
1,20,0,11,none
23,15,0,8,none
4,12,0,1,cirrus
12,20,0,12,cumulus
9,15,2,11,none

They now want the data sorted so it will have temperature in descending order and day number in ascending order. The result would be

1,20,0,11,none
3,20,0,12,cumulus
12,20,0,12,cumulus
9,15,2,11,none
23,15,0,0,none
4,12,0,1,cirrus

The array is stored and later they can extract it to a table and do lots of analysis on it. The extraction side is not changing so I cannot sort the data in the table, I have to create the data in the correct format to match the existing rules they have.

I could parse each row of the array and sort them but this seems a very long-handed method. There must be a quicker more efficient way to sort this 2D array by two columns? I think I could send it to a function and get returned the sorted array like:

private string[,]  SortData(string[,] Data)
{

    //In here we do the sorting

}

Any ideas please?

CodePudding user response:

I agree with the other answer that it's probably best to parse each row of the data into an instance of a class that encapsulates the data, creating a new 1D array or list from that data. Then you'd sort that 1D collection and convert it back into a 2D array.

However another approach is to write an IComparer class that you can use to compare two rows in a 2D array like so:

public sealed class WeatherComparer: IComparer
{
    readonly string[,] _data;

    public WeatherComparer(string[,] data)
    {
        _data = data;
    }

    public int Compare(object? x, object? y)
    {
        int row1 = (int)x;
        int row2 = (int)y;

        double temperature1 = double.Parse(_data[row1, 1]);
        double temperature2 = double.Parse(_data[row2, 1]);

        if (temperature1 < temperature2)
            return 1;

        if (temperature2 < temperature1)
            return -1;

        int day1 = int.Parse(_data[row1,0]);
        int day2 = int.Parse(_data[row2,0]);

        return day1.CompareTo(day2);
    }
}

Note that this includes a reference to the 2D array to be sorted, and parses the elements for sorting as necessary.

Then you need to create a 1D array of indices, which is what you are actually going to sort. (You can't sort a 2D array, but you CAN sort a 1D array of indices that reference the rows of the 2D array.)

public static string[,] SortData(string[,] data)
{
    int[] indexer = Enumerable.Range(0, data.GetLength(0)).ToArray();
    var comparer = new WeatherComparer(data);
    Array.Sort(indexer, comparer);

    string[,] result = new string[data.GetLength(0), data.GetLength(1)];

    for (int row = 0; row < indexer.Length;   row)
    {
        int dest = indexer[row];

        for (int col = 0; col < data.GetLength(1);   col)
            result[dest, col] = data[row, col];
    }

    return result;
}

Then you can call SortData to sort the data:

public static void Main()
{
    string[,] weatherDataArray = new string[6, 5]
    {
        { "3",  "20", "0", "12", "cumulus" },
        { "1",  "20", "0", "11", "none" },
        { "23", "15", "0", "8",  "none" },
        { "4",  "12", "0", "1",  "cirrus" },
        { "12", "20", "0", "12", "cumulus" },
        { "9",  "15", "2", "11", "none" }
    };

    var sortedWeatherData = SortData(weatherDataArray);

    for (int i = 0; i < sortedWeatherData.GetLength(0);   i)
    {
        for (int j = 0; j < sortedWeatherData.GetLength(1);   j)
             Console.Write(sortedWeatherData[i,j]   ", ");
            
        Console.WriteLine();
    }
}

Output:

1, 20, 0, 11, none,
3, 20, 0, 12, cumulus,
12, 20, 0, 12, cumulus,
9, 15, 2, 11, none,
23, 15, 0, 8, none,
4, 12, 0, 1, cirrus,

Note that this code does not contain any error checking - it assumes there are no nulls in the data, and that all the parsed data is in fact parsable. You might want to add appropriate error handling.

Try it on .NET Fiddle: https://dotnetfiddle.net/mwXyMs

CodePudding user response:

I would suggest parsing the data into objects that can be sorted by conventional methods. Like using LINQ:

myObjects.OrderBy(obj => obj.Property1)
         .ThenBy(obj=> obj.Property2);

Treating data as a table of strings will just make processing more difficult, since at every step you would need to parse values, handle potential errors since a string may be empty or contain an invalid value etc. It is a much better design to do all this parsing and error handling once when the data is read, and convert it to text-form again when writing it to disk or handing it over to the next system.

If this is a legacy system with lots of parts that handle the data in text-form I would still argue to parse the data first, and do it in a separate module so it can be reused. This should allow the other parts to be rewritten part by part to use the object format.

If this is completely infeasible you either need to convert the data to a jagged array, i.e. string[][]. Or write your own sorting that can swap rows in a multidimensional array.

CodePudding user response:

I had fun trying to make something better than the accepted answer, and I think I did.

Reasons it is better:

  • Which columns it uses to sort and whether in an ascending or descending order is not hardcoded, but passed in as parameters. In the post, I understood that they might change their mind in the future as to how to sort the data.
  • It supports sorting by columns that do not contain numbers, for if they ever want to sort by the name columns.
  • In my testing, for large data, it is much faster and allocates less memory.

Reasons it is faster:

  • It never parses the same index of Data twice. It caches the numbers.
  • When copying, it uses Span.CopyTo instead of indeces.
  • It doesn't create a new Data array, it sorts the rows in place. This also means it won't copy the rows that are already in the correct spots.

Here's the usage:

DataSorter.SortDataWithSortAguments(array, (1, false), (0, true));

And here's the code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.InteropServices;

namespace YourNamespace;

public static class DataSorter
{
    public static void SortDataWithSortAguments(string[,] Data, params (int columnIndex, bool ascending)[] sortingParams)
    {
        if (sortingParams.Length == 0)
        {
            return;
            // maybe throw an exception instead? depends on what you want
        }

        if (sortingParams.Length > 1)
        {
            var duplicateColumns =
                from sortingParam in sortingParams
                group false by sortingParam.columnIndex
                into sortingGroup
                where sortingGroup.Skip(1).Any()
                select sortingGroup.Key;

            var duplicateColumnsArray = duplicateColumns.ToArray();
            if (duplicateColumnsArray.Length > 0)
            {
                throw new ArgumentException($"Cannot sort by the same column twice. Duplicate columns are: {string.Join(", ", duplicateColumnsArray)}");
            }
        }

        for (int i = 0; i < sortingParams.Length; i  )
        {
            int col = sortingParams[i].columnIndex;
            if (col < 0 || col >= Data.GetLength(1))
            {
                throw new ArgumentOutOfRangeException($"Column index {col} is not within range 0 to {Data.GetLength(1)}");
            }
        }

        int[] linearRowIndeces = new int[Data.GetLength(0)];
        for (int i = 0; i < linearRowIndeces.Length; i  )
        {
            linearRowIndeces[i] = i;
        }

        Span<int> sortedRows = SortIndecesByParams(Data, sortingParams, linearRowIndeces);

        SortDataRowsByIndecesInPlace(Data, sortedRows);
    }


    private static float[]? GetColumnAsNumbersOrNull(string[,] Data, int columnIndex)
    {
        if (!float.TryParse(Data[0, columnIndex], out float firstNumber))
        {
            return null;
        }

        // if the first row of the given column is a number, assume all rows of the column should be numbers as well

        float[] column = new float[Data.GetLength(0)];
        column[0] = firstNumber;

        for (int row = 1; row < column.Length; row  )
        {
            if (!float.TryParse(Data[row, columnIndex], out column[row]))
            {
                throw new ArgumentException(
                    $"Rows 0 to {row - 1} of column {columnIndex} contained numbers, but row {row} doesn't");
            }
        }

        return column;
    }

    private static Span<int> SortIndecesByParams(
        string[,] Data,
        ReadOnlySpan<(int columnIndex, bool ascending)> sortingParams,
        IEnumerable<int> linearRowIndeces)
    {
        var (firstColumnIndex, firstAscending) = sortingParams[0];

        var firstColumn = GetColumnAsNumbersOrNull(Data, firstColumnIndex);

        IOrderedEnumerable<int> sortedRowIndeces = (firstColumn, firstAscending) switch
        {
            (null, true) => linearRowIndeces.OrderBy(row => Data[row, firstColumnIndex]),
            (null, false) => linearRowIndeces.OrderByDescending(row => Data[row, firstColumnIndex]),
            (not null, true) => linearRowIndeces.OrderBy(row => firstColumn[row]),
            (not null, false) => linearRowIndeces.OrderByDescending(row => firstColumn[row])
        };

        for (int i = 1; i < sortingParams.Length; i  )
        {
            var (columnIndex, ascending) = sortingParams[i];

            var column = GetColumnAsNumbersOrNull(Data, columnIndex);

            sortedRowIndeces = (column, ascending) switch
            {
                (null, true) => sortedRowIndeces.ThenBy(row => Data[row, columnIndex]),
                (null, false) => sortedRowIndeces.ThenByDescending(row => Data[row, columnIndex]),
                (not null, true) => sortedRowIndeces.ThenBy(row => column[row]),
                (not null, false) => sortedRowIndeces.ThenByDescending(row => column[row])
            };
        }

        return sortedRowIndeces.ToArray();
    }

    private static void SortDataRowsByIndecesInPlace(string[,] Data, Span<int> sortedRows)
    {
        Span<string> tempRow = new string[Data.GetLength(1)];

        for (int i = 0; i < sortedRows.Length; i  )
        {
            while (i != sortedRows[i])
            {
                Span<string> firstRow = MemoryMarshal.CreateSpan(ref Data[i, 0], tempRow.Length);
                Span<string> secondRow = MemoryMarshal.CreateSpan(ref Data[sortedRows[i], 0], tempRow.Length);

                firstRow.CopyTo(tempRow);
                secondRow.CopyTo(firstRow);
                tempRow.CopyTo(secondRow);

                (sortedRows[i], sortedRows[sortedRows[i]]) = (sortedRows[sortedRows[i]], sortedRows[i]);
            }
        }
    }
}

PS: I should not have spent so much time working on this considering my responsibilities, but it was fun.

  • Related