Home > Software engineering >  Fast algorithm to check equality between list of lists
Fast algorithm to check equality between list of lists

Time:12-21

I am currently dealing with a problem in which I have to check if two data structures, which are list of lists of the following type contain the same data:

List<List<T>>

T is a always primitive data type whose equality can be tested directly by the = operator, such as int or float.

  1. T represents a single transformation that can be performed on the system
  2. List<T> represents an operation, which is a sequence of transformations, so the order of elements in it matters
  3. List<List<T>> represents the full set of operations that can be performed on the system, so the order of elements do not matter. Only their presence matters.

For example;

List<int> l1 = new() {1, 2, 3};
List<int> l2 = new() {5, 6, 7, 8};
List<int> l3 = new() {3, 1, 2}; // same elements as l1 but different order

List<List<int>> LL1 = new() {l1, l2};
List<List<int>> LL2 = new() {l2, l1};
List<List<int>> LL1 = new() {l2, l3};

Expectation: LL1 = LL2, but LL1 != LL3 and also LL2 != LL3

I need a fast way to check equality between LL1, LL2, LL3, etc. using the conditions explained above. My current solution is (slightly modified for readability):

bool CheckEqual(List<List<T>> l1, List<List<T>> l2)
{
    if(l1.Count != l2.Count) return false;
    for (int i = 0; i < l1.Count; i  )
    {
        List<T> operation1 = l1[i]
        for (int j = 0; j < l2.Count; j  )
        {
             List<T> operation2 = l2[j];
             if(operation1.SequenceEqual(operation2)
             {
                  l1.RemoveAt(i);
                  l2.RemoveAt(j);
                  break;
             }
        }
    }
    // If l1 or l2 have any elements left, these are not equal
    return l1.Count == 0 && l2.Count == 0;
}

The size of the List<T> can be large (up to ~10000 elements), but the number of elements in List<List<T>> will be at most 20-30.

CodePudding user response:

It is easier to formaulate when using a class for the inner List, with appropiate order-sensitive Equals implementation, like:

class Operation : IEquatable<Operation>
{
    public List<int> Transformations;
    
    public Operation(params int[] transformations)
    {
        Transformations = transformations.ToList();
    }
    
    public bool Equals(Operation other) => Transformations.SequenceEqual(other.Transformations);
    public override bool Equals(object obj) => obj is Operation op && Equals(op);
    public override int GetHashCode()
    {
        var result = new HashCode();
        foreach (var e in Transformations)
            result.Add(e);
        return result.ToHashCode();
    }
}

Then we can write an efficient comparison of two List<Operation> objects disregarding the order:

bool EqualListList(List<Operation> ll1, List<Operation> ll2)
{
   var hs = new HashSet<Operation>(ll1);
   // the following two lines can be replaced by return hs.SetEquals(ll2);
   hs.SymmetricExceptWith(ll2);
   return !hs.Any();
}

Test

Operation l1 = new(1, 2, 3);
Operation l2 = new(5, 6, 7, 8);
Operation l3 = new(3, 1, 2); // same elements as l1 but different order

List<Operation> LL1 = new() {l1, l2};
List<Operation> LL2 = new() {l2, l1};
List<Operation> LL3 = new() {l2, l3};

Console.WriteLine(EqualListList(LL1, LL2));
Console.WriteLine(EqualListList(LL1, LL3));
Console.WriteLine(EqualListList(LL2, LL3));

=> produces True, false, False

CodePudding user response:

Find or implement a collection that implements GetHashCode and Equals that considers each element. I.e. deep equality. See HashCode.Combine for combining hashes, it might also be useful to cache this hashcode. Use this collection for the inner list.

Add all of the collections to a HashSet, use SetEquals to compare the outer list with another. This will ignore the order.

CodePudding user response:

Each individual list of integers can be sorted then map to a string of form "x1_x2_x3_...". For example, {1, 2, 3} or {2, 3, 1} can be mapped to "1_2_3". That way, we can compare any 2 lists of integers.

Now for a list of integer list: we can first map it to a list of string from the above suggested operation. Then, comparing 2 lists of string is straightforward: simple sort them then compare the strings one by one.

The code for that should be something similar to:

bool CheckEqual<T>(List<List<T>> l1, List<List<T>> l2)
    {
        if (l1.Count != l2.Count) return false;

        var sbl1 = l1.Select(list =>
        {
            var sb = new StringBuilder();
            foreach (var x in list.OrderBy(x => x))
                sb.Append(x).Append("_");
            return sb.ToString();
        }).OrderBy(x => x).ToArray();
        
        var sbl2 = l1.Select(list =>
        {
            var sb = new StringBuilder();
            foreach (var x in list.OrderBy(x => x))
                sb.Append(x).Append("_");
            return sb.ToString();
        }).OrderBy(x => x).ToArray();

        for (int i = 0; i < sbl1.Length; i  )
        {
            if (sbl1[i] != sbl2[i])
                return false;
        }

        return true;
    }

CodePudding user response:

you can use Except :

var l1notl2 = l1.Except(l2).ToList();
var l2notl1 = list2.Except(list1).ToList();

You can combine these and create method with the above and then return:

return !l1notl2.Any() && !l2notl1.Any();

Except use hashing so the time complexity should be close to O(N)

CodePudding user response:

Generate a hash code for every list and sort the codes of every list of lists. Then hash the lists of lists.

In case you find two equal list-of-lists hashes, compare the hashes in both lists (you never know). And in case of equal sub-hashes, compare the lists themselves.

You can as well believe in luck and accept equal hashes for good matches. (Hash on 64 bits for very low probabilities.)

  • Related