Home > Net >  C# Filter duplicates from list according to two criteria
C# Filter duplicates from list according to two criteria

Time:12-28

I want to filter duplicates from a list of objects of type A based on a function IsDuplicate.

class A
{
    public int p1;
    public List<int> p2;
    public bool IsDuplicate(A other) => p2.Sum() == other.p2.Sum();

    public List<A> GetDistinctA(List<A> objects)
    {
        List<A> results = new();
        for (int i = 0; i < objects.Count; i  )
        {
            bool duplicated = false;
            for (int j = i   1; j < objects.Count; j  )
            {
                if (objects[i].IsDuplicate(objects[j]))
                {
                    duplicated = true;
                    break;
                }
            }

            if (duplicated)
            {
                continue;
            }

            results.Add(objects[i]);
        }
        return results;
    }
}

I can accomplish this with a nested for-loop.

New requirement is to deduplicate the list based on the same function isDuplicate but the winning element if there is a duplicate is the one with the higher value of property p1. Sorting objects by p1 will work but I guess that is not the most performant solution.

I am aware of Linqs GroupBy and Distinct but I don't know how to use the function as comparison criterion rather than a property.

CodePudding user response:

You only need to store all duplicates in a separate list "duplicates" and then

p2.RemoveAll(duplicates);

The new method for getting a distinct list AND removing all duplicates:

public List<A> GetDistinctAndRemoveDuplicatesFromA(List<A> objects)
{
    List<A> results = new();
    List<A> duplicates = new();

    for (int i = 0; i < objects.Count; i  )
    {
        bool duplicated = false;
        for (int j = i   1; j < objects.Count; j  )
        {
            if (objects[i].IsDuplicate(objects[j]))
            {
                duplicated = true;
                if(objects[i].p1 < objects[j].p1) duplicates.Add(objects[i]); // Higher p1 survives
                break;
            }
        }

        if (duplicated)
        {
            continue;
        }

        results.Add(objects[i]);
    }
    objects.RemoveAll(duplicates)
    return results;
}

CodePudding user response:

You can group the items by the sum of p2, order the groups by p1 in descending order, then take the first one from each group in order to remove the duplicates.

You can also make the method static, since it doesn't need instance data:

public static List<A> GetDistinctA(List<A> objects)
{
    return objects?
        .GroupBy(a => a.p2.Sum())
        .Select(a => a.OrderByDescending(x => x.p1).First())
        .ToList();
}

Example usage:

var items = new List<A>
{
    new A { p1 = 0, p2 = new List<int> { 1, 2, 3, 4, 5 } }, // 15
    new A { p1 = 1, p2 = new List<int> { 1, 2, 3 } },       // 6
    new A { p1 = 2, p2 = new List<int> { 1, 2, 5 } },       // 8
    new A { p1 = 3, p2 = new List<int> { 14 } },
    new A { p1 = 4, p2 = new List<int> { 3 } },
    new A { p1 = 5, p2 = new List<int> { 1, 2 } },          // 3
    new A { p1 = 6, p2 = new List<int> { 2, 3, 4, 5 } },    // 14
    new A { p1 = 7, p2 = new List<int> { 6 } },
    new A { p1 = 8, p2 = new List<int> { 15 } },
    new A { p1 = 9, p2 = new List<int> { 1, 3, 4} },        // 8
    new A { p1 = 10, p2 = new List<int> { 15 } },
};

var results = A.GetDistinct(items);

Console.WriteLine(string.Join(", ", results.Select(a => a.p1)));
// Output: 10, 7, 9, 6, 5
  • Related