Home > database >  How can I create a new List<T> based on two other List<T> and account for duplicates?
How can I create a new List<T> based on two other List<T> and account for duplicates?

Time:09-26

My first post. Humbled by this community. Thank you.

The goal: Create a new List<PropertyB> based on two other lists: List<PropertyA> and another List<PropertyB>.

For each PropertyA in the List<PropertyA>, create a new PropertyB(), assigning the DisplayName to the new PropertyB's Name property. For each property in 'List', if the name from PropertyA matches PropertyB, assign the value to the new list's value property.

The problem: Accounting for Duplicate values. No data loss can occur between the lists.

The new list should include: Every PropertyA and every Value of the PropertyB list where there is a Name match. The types:

My thoughts: My gut says the inner loop should check whether something has already been added to the collection. Or perhaps an accounting of duplicate values (ie: the index of duplicates?)

Any assistance is appreciated!

 public class PropertyA{
      private string DisplayName{get; set;}
      private string Name {get; set;}
      private string Value {get; set;}
}
 public class PropertyB{
      private string Name{get; set;}
      private string Value{get; set;}
}

Initialization:

List<PropertyA> listA = new List<PropertyA>()
{
      new PropertyA(){ DisplayName="LOB", Name="lineofbusiness", Value="test"},
      new PropertyA(){ DisplayName="ABC", Name="alpha", Value="test2"},
      new PropertyA(){ DisplayName="DEF", Name="beta", Value="test3"},
      new PropertyA(){ DisplayName="GHI", Name="zeta", Value="test4"},
      new PropertyA(){ DisplayName"Line of Business", Name="lineofbusiness", Value="test5"
};
List<PropertyB> listB = new List<PropertyB>()
{
      new PropertyB(){ Name="lineofbusiness", Value="test789"},
      new PropertyB(){ Name="alpha", Value="test234"},
      new PropertyB(){ Name="lineofbusiness", Value="test456"},
      new PropertyB(){ Name="beta", Value="test123"},
};

In Main:

List<PropertyB> newList = new List<PropertyB>();

foreach(PropertyA propA in listA){
      PropertyB newProp = new PropertyB();
      newProp.Name = propA.DisplayName;
     
     foreach(PropertyB propB in listB){
     
          if(propA.Name == propB.Name){
               newProp.Value = propB.Value;
               break; 
          }
     }
     newList.Add(newProp);
}

UPDATE: The console output (if you choose) should be as follows:

LOB test789
ABC test234 
DEF test123 
GHI null 
Line of Business test456 

if you simply remove the break; you end up with:

LOB test456
ABC test234 
DEF test123 
GHI null 
Line of Business test456 

The inner loop will always assign the LAST name match value. That's a problem

CodePudding user response:

you can just fix your code, add a check for duplicates

List<PropertyB> newList = new List<PropertyB>();

foreach(PropertyA propA in listA)
{

 PropertyB newProp = new PropertyB();
  newProp.Name = propA.DisplayName;

foreach (var propB in listB)
{
    if (propA.Name == propB.Name)
    {
        if( newList.Any(l =>l.Value==propB.Value )) continue;
        newProp.Value = propB.Value;
        break;
    }
}
 newList.Add(newProp);
}

but to make it more reliable I would offer this

    List<PropertyA> newList = new List<PropertyA>();

    foreach (var propA in listA)
    {
        var newProp = new PropertyA();
        newProp.Name = propA.DisplayName;
        
        newProp.DisplayName = propA.Name;
        
        foreach (var propB in listB)
        {
            if (propA.Name == propB.Name)
            {
            if (newList.Any(l => l.Value == propB.Value 
                           && l.DisplayName==propA.Name)) continue;
                newProp.Value = propB.Value;
                break;
            }
        }
    
        newList.Add(newProp);
    }

    var result = newList.Select(l => new PropertyB {Name=l.Name, Value=l.Value} );

both algorithms show the same result during the test

LOB test789
ABC test234 
DEF test123 
GHI null 
Line of Business test456 

CodePudding user response:

I understood the process:

  1. list of A needs turning into a list of B
  2. Some of the list of B items might have a Value copied from some other list of B
    var d = bList.ToDictionary(b => b.Name, b => b.Value);
    var newB = aList.Select(a => new B { Name = a.DisplayName, Value = d.GetValueOrDefault(a.Name) } ).ToList();

You said no data shall be lost but I think inherently you must have to throw something away because B has fewer properties than A and some properties from B are used to "overwrite"/take the place of those in A..

I note also you have duplicated Name in your sample data list B, which the ToDictionary won't tolerate. You didn't specify how to resolve this but you'll have to choose (if it truly does occur) what value to pick or if to take multiple. This, for example, would tolerate duplicate names

    var d = bList.ToLookup(b => b.Name, b => b.Value);
    var newB = aList.Select(a => new B { Name = a.DisplayName, Value = d[a.Name]?.First() } ).ToList();

Again, this throws stuff away.. if you want to keep all the values you'll have to encode the Value somehow

Value = string.Join(",", d[a.Name])

for example


So, it looks like you want to keep all the duplicates and dispense them in order. We could do that by grouping these things into a list that we pull the items out of as we enumerate

    var d = bList.GroupBy(b => b.Name, b => b.Value).ToDictionary(g => g.Key, g => g.ToList());
   
    var newB = new List<B>();
    foreach(var a in aList){
      var b = new B { Name = a.DisplayName };

      if(d.TryGetValue(a.Name, out var lst)){
        b.Value = lst[0];
        lst.RemoveAt(0);
      }
    }
  • Related