Shortest list from a two dimensional array-CodePudding

This question is more about an algorithm than actual code, but example code would be appreciated.

Let's say I have a two-dimensional array such as this:

    A  B  C  D  E
   --------------
1 | 0  2  3  4  5
2 | 1  2  4  5  6
3 | 1  3  4  5  6
4 | 2  3  4  5  6
5 | 1  2  3  4  5

I am trying to find the shortest list that would include a value from each row. Currently, I am going row by row and column by column, adding each value to a SortedSet and then checking the length of the set against the shortest set found so far. For example:

Adding cells {1A, 2A, 3A, 4A, 5A} would add the values {0, 1, 1, 2, 1} which would result in a sorted set {0, 1, 2}. {1B, 2A, 3A, 4A, 5A} would add the values {2, 1, 1, 2, 1} which would result in a sorted set {1, 2}, which is shorter than the previous set.

Obviously, adding {1D, 2C, 3C, 4C, 5D} or {1E, 2D, 3D, 4D, 5E} would be the shortest sets, having only one item each, and I could use either one.

I don't have to include every number in the array. I just need to find the shortest set while including at least one number from every row.

Keep in mind that this is just an example array, and the arrays that I'm using are much, much larger. The smallest is 495x28. Brute force will take a VERY long time (28^495 passes). Is there a shortcut that someone knows, to find this in the least number of passes? I have C# code, but it's kind of long.

Edit:

Posting current code, as per request:

// Set an array of counters, Add enough to create largest initial array
int ListsCount = MatrixResults.Count();
int[] Counters = new int[ListsCount];
SortedSet<long> CurrentSet = new SortedSet<long>();
for (long X = 0; X < ListsCount; X  )
{
    Counters[X] = 0;
    CurrentSet.Add(X);
}

while (true)
{
    // Compile sequence list from MatrixResults[]
    SortedSet<long> ThisSet = new SortedSet<long>();
    for (int X = 0; X < Count4; X   )
    {
        ThisSet.Add(MatrixResults[X][Counters[X]]);
    }

    // if Sequence Length less than current low, set ThisSet as Current
    if (ThisSet.Count() < CurrentSet.Count())
    {
        CurrentSet.Clear();
        long[] TSI = ThisSet.ToArray();
        for (int Y = 0; Y < ThisSet.Count(); Y   )
        {
            CurrentSet.Add(TSI[Y]);
        }
    }

    // Increment Counters
    int Index = 0;
    bool EndReached = false;
    while (true)
    {
        Counters[Index]  ;
        if (Counters[Index] < MatrixResults[Index].Count()) break;
        Counters[Index] = 0;
        Index  ;
        if (Index >= ListsCount)
        {
            EndReached = true;
            break;
        }
        Counters[Index]  ;
    }

    // If all counters are fully incremented, then break
    if (EndReached) break;
}

CodePudding user response：

This problem is NP hard.

To show that, we have to take a known NP hard problem, and reduce it to this one. Let's do that with the Set Cover Problem.

We start with a universe U of things, and a collection S of sets that covers the universe. Assign each thing a row, and each set a number. This will fill different numbers of columns for each row. Fill in a rectangle by adding new numbers.

Now solve your problem.

For each new number in your solution that didn't come from a set in the original problem, we can replace it with another number in the same row that did come from a set.

And now we turn numbers back into sets and we have a solution to the Set Cover Problem.

The transformations from set cover to your problem and back again are both O(number_of_elements * number_of_sets) which is polynomial in the input. And therefore your problem is NP hard.

Conversely if you replace each number in the matrix with the set of rows covered, your problem turns into the Set Cover Problem. Using any existing solver for set cover then gives a reasonable approach for your problem as well.

CodePudding user response：

The code is not particularly tidy or optimised, but illustrates the approach I think @btilly is suggesting in his answer (E&OE) using a bit of recursion (I was going for intuitive rather than ideal for scaling, so you may have to work an iterative equivalent).

From the rows with their values make a "values with the rows that they appear in" counterpart. Now pick a value, eliminate all rows in which it appears and solve again for the reduced set of rows. Repeat recursively, keeping only the shortest solutions.

I know this is not terribly readable (or well explained) and may come back to tidy up in the morning, so let me know if it does what you want (is worth a bit more of my time;-).

//  Setup
var rowValues = new Dictionary<int, HashSet<int>>
{
  [0] = new() { 0, 2, 3, 4, 5 },
  [1] = new() { 1, 2, 4, 5, 6 },
  [2] = new() { 1, 3, 4, 5, 6 },
  [3] = new() { 2, 3, 4, 5, 6 },
  [4] = new() { 1, 2, 3, 4, 5 }
};

Dictionary<int, HashSet<int>> ValueRows(Dictionary<int, HashSet<int>> rv)
{
  var vr  = new Dictionary<int, HashSet<int>>();
  foreach (var row in rv.Keys)
  {
    foreach (var value in rv[row])
    {
      if (vr.ContainsKey(value))
      {
        if (!vr[value].Contains(row))
          vr[value].Add(row);
      }
      else
      {
        vr.Add(value, new HashSet<int> { row });
      }
    }
  }
  return vr;
}

List<int> FindSolution(Dictionary<int, HashSet<int>> rAndV)
{
  if (rAndV.Count == 0) return new List<int>();
  var bestSolutionSoFar = new List<int>();
  var vAndR = ValueRows(rAndV);
  foreach (var v in vAndR.Keys)
  {
    var copyRemove = new Dictionary<int, HashSet<int>>(rAndV);
    foreach (var r in vAndR[v])
      copyRemove.Remove(r);
    var solution = new List<int>{ v };
    solution.AddRange(FindSolution(copyRemove));
    if (bestSolutionSoFar.Count == 0 || solution.Count > 0 && solution.Count < bestSolutionSoFar.Count)
      bestSolutionSoFar = solution;
  }
  return bestSolutionSoFar;
}

var solution = FindSolution(rowValues);
Console.WriteLine($"Optimal solution has values {{ {string.Join(',', solution)} }}");

output Optimal solution has values { 4 }

CodePudding user response：

With all computations there is always a tradeoff, several factors are in play, like will You get paid for getting it perfect (in this case for me, no). This is a case of the best being the enemy of the good. How long can we spend on solving a problem and will it be sufficient to get close enough to fulfil the use case (imo) and when we can solve the problem without hand painting pixels in UHD resolution to get the idea of a key through, lets!

So, my choice is an approach which will get a covering set which is small and ehem... sometimes will be the smallest :) In essence because of the sequence in comparing would to be spot on be iterative between different strategies, comparing the length of the sets for different strategies - and for this evening of fun I chose to give one strategy which is I find defendable to be close to or equal the minimal set.

So this strategy is to observe the multi dimensional array as a sequence of lists that has a distinct value set each. Then if reducing the total amount of lists with the smallest in the remainder iteratively, weeding out any non used values in that smallest list when having reduced total set in each iteration we will get a path which is close enough to the ideal to be effective as it completes in milliseconds with this approach.

A critique of this approach up front is then that the direction you pass your minimal list in really would have to get iteratively varied to pick best, left to right, right to left, in position sequences X,Y,Z, ... because the amount of potential reducing is not equal. So to get close to the ideal iterations of sequences would have to be made for each iteration too until all combinations were covered, choosing the most reducing sequence. right - but I chose left to right, only!

Now I chose not to run compare execution against Your code, because of the way you instantiate your MatrixResults is an array of int arrays and not instantiated as a multidimension array, which your drawing is, so I went by Your drawing and then couldn't share data source with your code. No matter, you can make that conversion if you wish, onwards to generate sample data:

private int[,] CreateSampleArray(int xDimension, int yDimensions, Random rnd)
{
    Debug.WriteLine($"Created sample array of dimensions ({xDimension}, {yDimensions})");
    var array = new int[xDimension, yDimensions];
     for (int x = 0; x < array.GetLength(0); x  )
    {
        for(int y = 0; y < array.GetLength(1); y  )
        {
            array[x, y] = rnd.Next(0, 4000);
        }
    }
    return array;
}

The overall structure with some logging, I'm using xUnit to run the code in

[Fact]
public void SetCoverExperimentTest()
{
    var rnd = new Random((int)DateTime.Now.Ticks);
    var sw = Stopwatch.StartNew();

    int[,] matrixResults = CreateSampleArray(rnd.Next(100, 500), rnd.Next(100, 500), rnd);

    //So first requirement is that you must have one element per row, so lets get our unique rows
    var listOfAll = new List<List<int>>();
    List<int> listOfRow;
    for (int y = 0; y < matrixResults.GetLength(1); y  )
    {
        listOfRow = new List<int>();
        for (int x = 0; x < matrixResults.GetLength(0); x  )
        {
            listOfRow.Add(matrixResults[x, y]);
        }
        listOfAll.Add(listOfRow.Distinct().ToList());
    }
    var setFound = new HashSet<int>();
    List<List<int>> allUniquelyRequired = GetDistinctSmallestList(listOfAll, setFound);

    // This set now has all rows that are either distinctly different
    // Or have a reordering of distinct values of that length value lists
    // our HashSet has the unique value range

    //Meaning any combination of sets with those values,
    //grabbing any one for each set, prefering already chosen ones should give a covering total set

    var leastSet = new LeastSetData
    {
        LeastSet = setFound,
        MatrixResults = matrixResults,
    };

    List<Coordinate>? minSet = leastSet.GenerateResultsSet();
    sw.Stop();
    Debug.WriteLine($"Completed in {sw.Elapsed.TotalMilliseconds:0.00} ms");
    Assert.NotNull(minSet);

    //There is one for each row
    Assert.False(minSet.Select(s => s.y).Distinct().Count() < minSet.Count());

    //We took less than 25 milliseconds
    var timespan = new TimeSpan(0, 0, 0, 0, 25);
    Assert.True(sw.Elapsed < timespan);

    //Outputting to debugger for the fun of it
    var sb = new StringBuilder();
    foreach (var coordinate in minSet)
    {
        sb.Append($"({coordinate.x}, {coordinate.y}) {matrixResults[coordinate.x, coordinate.y]},");
    }
    var debugLine = sb.ToString();
    debugLine = debugLine.Substring(0, debugLine.Length - 1);
    Debug.WriteLine("Resulting set: "   debugLine);
}

Now the more meaty iterative bits

private List<List<int>> GetDistinctSmallestList(List<List<int>> listOfAll, HashSet<int> setFound)
{
    // Our smallest set must be a subset the distinct sum of all our smallest lists for value range,
    // plus unknown 
    var listOfShortest = new List<List<int>>();
    int shortest = int.MaxValue;
    foreach (var list in listOfAll)
    {
        if (list.Count < shortest)
        {
            listOfShortest.Clear();
            shortest = list.Count;
            listOfShortest.Add(list);
        }
        else if (list.Count == shortest)
        {
            if (listOfShortest.Contains(list))
                continue;
            listOfShortest.Add(list);
        }
    }

    var setFoundAddition = new HashSet<int>(setFound);

    foreach (var list in listOfShortest)
    {
        foreach (var item in list)
        {
            if (setFound.Contains(item))
                continue;
            if (setFoundAddition.Contains(item))
                continue;
            setFoundAddition.Add(item);
        }
    }

    //Now we can remove all rows with those found, we'll add the smallest later
    var listOfAllRemainder = new List<List<int>>();
    bool foundInList;
    List<int> consumedWhenReducing = new List<int>();
    foreach (var list in listOfAll)
    {
        foundInList = false;
        foreach (int item in list)
        {
            if (setFound.Contains(item))
            {
                //Covered by data from last iteration(s)
                foundInList = true;
                break;
            }
            else if (setFoundAddition.Contains(item))
            {
                consumedWhenReducing.Add(item);
                foundInList = true;
                break;
            }
        }
        if (!foundInList)
        {
            listOfAllRemainder.Add(list); //adding what lists did not have elements found
        }
    }

    //Remove any from these smallestset lists that did not get consumed in the favour used pass before
    if (consumedWhenReducing.Count == 0)
    {
        throw new Exception($"Shouldn't be possible to remove the row itself without using one of its values, please investigate");
    }
    var removeArray = setFoundAddition.Where(a => !consumedWhenReducing.Contains(a)).ToArray();
    setFoundAddition.RemoveWhere(x => removeArray.Contains(x));

    foreach (var value in setFoundAddition)
    {
        setFound.Add(value);
    }


    if (listOfAllRemainder.Count != 0)
    {
        //Do the whole thing again until there in no list left                
        listOfShortest.AddRange(GetDistinctSmallestList(listOfAllRemainder, setFound));
    }
    return listOfShortest; //Here we will ultimately have the sum of shortest lists per iteration
}

To conclude: I hope to have inspired You, at least I had fun coming up with a best approximate, and should you feel like completing the code, You're very welcome to grab what You like.

Obviously we should really track the sequence we go through the shortest lists, after all it is of significance if we start by reducing the total distinct lists by element at position 0 or 0 N and which one we reduce with after. I mean we must have one of those values but each time consuming each value has removed most of the total list all it really produces is a value range and the range consumption sequence matters to the later iterations - Because a position we didn't reach before there were no others left e.g. could have remove potentially more than some which were covered. You get the picture I'm sure.

And this is just one strategy, One may as well have chosen the largest distinct list even within the same framework and if You do not iteratively cover enough strategies, there is only brute force left.

Anyways you'd want an AI to act. Just like a human, not to contemplate the existence of universe before, after all we can reconsider pretty often with silicon brains as long as we can do so fast.

With any moving object at least, I'd much rather be 90% on target correcting every second while taking 14 ms to get there, than spend 2 seconds reaching 99% or the illusive 100% => meaning we should stop the vehicle before the concrete pillar or the pram or conversely buy the equity when it is a good time to do so, not figuring out that we should have stopped, when we are allready on the other side of the obstacle or that we should've bought 5 seconds ago, but by then the spot price already jumped again...

Thus the defense rests on the notion that it is opinionated if this solution is good enough or simply incomplete at best :D

I realize it's pretty random, but just to say that although this sketch is not entirely indisputably correct, it is easy to read and maintain and anyways the question is wrong B-] We will very rarely need the absolute minimal set and when we do the answer will be much longer :D

... woopsie, forgot the support classes

public struct Coordinate
{
    public int x;
    public int y;

    public override string ToString()
    {
        return $"({x},{y})";
    }
}
public struct CoordinateValue
{
    public int Value { get; set; }
    public Coordinate Coordinate { get; set; }
    public override string ToString()
    {
        return string.Concat(Coordinate.ToString(), " ", Value.ToString());
    }
}

public class LeastSetData
{
    public HashSet<int> LeastSet { get; set; }
    public int[,] MatrixResults { get; set; }
    public List<Coordinate> GenerateResultsSet()
    {
        HashSet<int> chosenValueRange = new HashSet<int>();
        var chosenSet = new List<Coordinate>();
        for (int y = 0; y < MatrixResults.GetLength(1); y  )
        {
            var candidates = new List<CoordinateValue>();
            for (int x = 0; x < MatrixResults.GetLength(0); x  )
            {
                if (LeastSet.Contains(MatrixResults[x, y]))
                {
                    candidates.Add(new CoordinateValue
                    {
                        Value = MatrixResults[x, y],
                        Coordinate = new Coordinate { x = x, y = y }
                    }
                    );
                    continue;
                }
            }
            if (candidates.Count == 0)
                throw new Exception($"OMG Something's wrong! (this row did not have any of derived range [y: {y}])");
            var done = false;
            foreach (var c in candidates)
            {
                if (chosenValueRange.Contains(c.Value))
                {
                    chosenSet.Add(c.Coordinate);
                    done = true;
                    break;
                }
            }
            if (!done)
            {
                var firstCandidate = candidates.First();
                chosenSet.Add(firstCandidate.Coordinate);
                chosenValueRange.Add(firstCandidate.Value);
            }
        }
        return chosenSet;
    }
}