Consider a hypothetical scenario where a huge used car dealer (e.g. CarMax) has many dealerships in different states.
public class Inventory
{
public string Make { get; set; }
public string Model { get; set; }
public string Year { get; set; }
public string State { get; set; }
}
List<Inventory> cars = new List<Inventory>();
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "MT" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "AR" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "OH" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "AZ" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "OR" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "MN" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "KY" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2020", State = "FL" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2020", State = "GA" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2010", State = "TN" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2010", State = "WY" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2012", State = "WY" } );
From cars I would like to create a list of objects with every combination of Make, Model and Year properties, each with a States property that contains a list of all states that have vehicles matching those first three properties:
var vehicles[0]: Make = "Ford", Model = "F-150", Year = "2014", States = { "MT", "AR", "OH" }
var vehicles[1]: Make = "Ford", Model = "F-150", Year = "2015", States = { "AZ", "OR", "MN", "KY" }
var vehicles[2]: Make = "Ford", Model = "F-150", Year = "2020", States = { "FL", "GA" }
var vehicles[3]: Make = "Ford", Model = "Ranger", Year = "2010", States = { "TN", "WY" }
var vehicles[4]: Make = "Ford", Model = "Ranger", Year = "2012", States = { "WY" }
I've looked into using a self-join GroupJoin linq method approach as well as a nested linq query, but I'm stumbling on trying to create the States child collection in the process.
One catch: the dataset that I'm working with contains upwards of 100-million rows (in this example Make/Model/Year/State combinations) with 6 grouping properties rather than 3. As such I'm trying to avoid, if possible, any multi-step processes that involve looping through the data.
CodePudding user response:
Consider a Dictionary where the key is a Make, Model, Year combination and the vales are a Set of strings (states)
var result = new Dictionary<(string make, string model, string year), HashSet<string>>();
The important thing here is that the dictionary will not allow you to have more than 1 item per make/model/year, and the hash set will not allow you to have more than 1 state. Once you have the right collection types, the population code basically writes itself
foreach (var item in items)
{
var key = (item.Make, item.Model, item.Year);
if (!dictionary.TryGetValue(key, out var states))
{
states = new HashSet<string>();
dictionary.Add(key, states);
}
states.Add(item.State);
}
CodePudding user response:
I think we can use lambda GroupBy
and string.Join
to iterator collection, which might not need to use GroupJoin
cars.GroupBy(x=> new {x.Make,x.Model,x.Year})
.Select(x=> new {
x.Key.Make,
x.Key.Year,
x.Key.Model,
State = string.Join(",",x.Select(z=>z.State))
});
if you want to make it faster we can try to use PLINQ to instead