Home > Mobile >  How to remove duplicates from a list of nested objects?
How to remove duplicates from a list of nested objects?

Time:02-23

I know there are many answers out there suggesting overriding equals and hashcode, but in my case, that is not possible because the objects used are imported from DLLs.

First, I have a list of objects called DeploymentData.

These objects, along other properties, contain the following two: Location(double x, double y, double z) and Duct(int id).

The goal is to remove those that have the same Location parameters.

First, I grouped them by Duct, as a Location can not be the same if it's on another duct.

var groupingByDuct = deploymentDataList.GroupBy(x => x.Duct.Id).ToList();

Then the actual algorithm:

List<DeploymentData> uniqueDeploymentData = new List<DeploymentData>();
foreach (var group in groupingByDuct) {
   uniqueDeploymentData 
      .AddRange(group 
      .Select(x => x)
      .GroupBy(d => new { d.Location.X, d.Location.Y, d.Location.Z })
      .Select(x => x.First()).ToList());
}

This does the work, but in order to properly check that they are indeed duplicates, the entire location should be compared. For this, I've made the following method:

private bool CompareXYZ(XYZ point1, XYZ point2, double tolerance = 10)
{
   if (System.Math.Abs(point1.X - point2.X) < tolerance &&
       System.Math.Abs(point1.Y - point2.Y) < tolerance &&
       System.Math.Abs(point1.Z - point2.Z) < tolerance) {
      return true;
   }
   return false;
}

BUT I have no idea how to apply that to the code written above. To sum up:

  • How can I write the algorithm above without all those method calls?
  • How can I adjust the algorithm above to use the CompareXYZ method for a better precision?
  • Efficiency?

CodePudding user response:

An easy way to filter duplicates is to use a Hashset with a custom equality comparer. This is a class that implements IEqualityComparer, e.g.:

public class DeploymentDataEqualityComparer : IEqualityComparer<DeploymentData>
{
  private readonly double _tolerance;

  public DeploymentDataEqualityComparer(double tolerance)
  {
    _tolerance = tolerance;
  }

  public bool Equals(DeploymentData a, DeploymentData b)
  {
    if (a.Duct.id != b.Duct.id)
      return false; // Different Duct, therefore not equal
    if (System.Math.Abs(a.Location.X - b.Location.X) < _tolerance &&
       System.Math.Abs(a.Location.Y - b.Location.Y) < _tolerance &&
       System.Math.Abs(a.Location.Z - b.Location.Z) < _tolerance) {
      return true;
     }
     return false;
  }

  public GetHashCode(DeploymentData dd)
  {
    // If the classes of the library do not implement GetHashCode, you can create a custom implementation
    return dd.Duct.GetHashCode() | dd.Location.GetHashCode();
  }
}

In order to filter duplicates, you can then add them to a HashSet:

var hashSet = new HashSet<DeploymentData>(new DeploymentDataEqualityComparer(10));
foreach (var deploymentData in deploymentDataList)
  hashSet.Add(deploymentData);

This way, you do not need to group by duct and use the enhanced performance of the HashSet.

  • Related