Home > OS >  c# linq - aggregate values to a list property
c# linq - aggregate values to a list property

Time:05-08

Consider a hypothetical scenario where a huge used car dealer (e.g. CarMax) has many dealerships in different states.

public class Inventory
{
    public string Make { get; set; }
    public string Model { get; set; }
    public string Year { get; set; }
    public string State { get; set; }
}

List<Inventory> cars = new List<Inventory>();
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "MT" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "AR" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "OH" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "AZ" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "OR" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "MN" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "KY" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2020", State = "FL" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2020", State = "GA" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2010", State = "TN" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2010", State = "WY" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2012", State = "WY" } );

From cars I would like to create a list of objects with every combination of Make, Model and Year properties, each with a States property that contains a list of all states that have vehicles matching those first three properties:

var vehicles[0]: Make = "Ford", Model = "F-150", Year = "2014", States = { "MT", "AR", "OH" }
var vehicles[1]: Make = "Ford", Model = "F-150", Year = "2015", States = { "AZ", "OR", "MN", "KY" }
var vehicles[2]: Make = "Ford", Model = "F-150", Year = "2020", States = { "FL", "GA" }
var vehicles[3]: Make = "Ford", Model = "Ranger", Year = "2010", States = { "TN", "WY" }
var vehicles[4]: Make = "Ford", Model = "Ranger", Year = "2012", States = { "WY" }

I've looked into using a self-join GroupJoin linq method approach as well as a nested linq query, but I'm stumbling on trying to create the States child collection in the process.

One catch: the dataset that I'm working with contains upwards of 100-million rows (in this example Make/Model/Year/State combinations) with 6 grouping properties rather than 3. As such I'm trying to avoid, if possible, any multi-step processes that involve looping through the data.

CodePudding user response:

Consider a Dictionary where the key is a Make, Model, Year combination and the vales are a Set of strings (states)

var result = new Dictionary<(string make, string model, string year), HashSet<string>>();

The important thing here is that the dictionary will not allow you to have more than 1 item per make/model/year, and the hash set will not allow you to have more than 1 state. Once you have the right collection types, the population code basically writes itself

foreach (var item in items)
{
    var key = (item.Make, item.Model, item.Year);
    if (!dictionary.TryGetValue(key, out var states))
    {
        states = new HashSet<string>();
        dictionary.Add(key, states);
    }

    states.Add(item.State);
}

CodePudding user response:

I think we can use lambda GroupBy and string.Join to iterator collection, which might not need to use GroupJoin

cars.GroupBy(x=> new {x.Make,x.Model,x.Year})
    .Select(x=> new {
        x.Key.Make,
        x.Key.Year,
        x.Key.Model,
        State = string.Join(",",x.Select(z=>z.State))
    });

if you want to make it faster we can try to use PLINQ to instead

  • Related