Preferably using System.linq
So in my program I have a list of Purchases which contain a list of Products. I am trying to determine which two products are more often purchased together in a single sale. For example:
Products - banana, apple, teddy bear, beer, diaper, video game, car
Purchases1 - banana, teddy bear, diaper, beer
Purchases2 - diaper, car, beer
Purchases3 - banana, video game, car
Most frequently bought together couple of products = diaper and beer.
Does anyone know the most optimal way of doing this? In practice my dictionary of Purchases has around 2.4 million elements in it and there 8013 unique products in the Products dictionary.
CodePudding user response:
I believe you need to create a Dictionary
to keep track of your purchase pairs. Perhaps you could sort your purchase lists so that the entries are always in alphabetical order, then you can just iterate over the lists you have.
Below is a sample project I made that does what you are requesting, although the products are simply char
. It may not be the most efficient, but it will at least show you a way to accomplish what you are asking.
class Program
{
class Purchase{
public List<char> Products = new List<char>();
}
static void Main(string[] args)
{
do
{
// here is our list of products
char[] products = new char[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };
// randomly populate a list of purchases
Random random = new Random();
List<Purchase> purchases = new List<Purchase>();
for (int i = 0; i < 10000; i )
{
var currPurchase = new Purchase();
foreach (var p in products)
if (random.Next(0, 2) > 0)
currPurchase.Products.Add(p);
purchases.Add(currPurchase);
}
// for all of our products, we need to make a dictionary of all the possible pairs
Dictionary<(char, char), int> count = new Dictionary<(char, char), int>();
// now for each purchase, we go through and see all the pairs that happened, and update those dictionary entries
foreach (var purchase in purchases)
{
// get the pairs, update the pairs
for (int i = 0; i < purchase.Products.Count; i )
{
for (int j = i 1; j < purchase.Products.Count; j )
{
if (count.ContainsKey((purchase.Products[i], purchase.Products[j])))
count[(purchase.Products[i], purchase.Products[j])] ;
else
count[(purchase.Products[i], purchase.Products[j])] = 1;
}
}
}
// then get the pair that had the highest frequency
var highest = count.Max(kvp => kvp.Value);
// and then get all the keys that occurred that many times (assuming we could have two pairs of equal frequency)
var mostFrequent = count.Where(kvp => kvp.Value == highest).Select(x => x.Key);
Console.WriteLine($"Most Frequent Pairs (Occurred {highest} times): ");
foreach (var pair in mostFrequent)
Console.Write(pair "; ");
//type quit and hit enter to quit
} while (Console.ReadLine() != "quit");
}
}
CodePudding user response:
here is my attempt using LINQ:
var purchases = new List<List<string>>
{
new List<string>() { "banana", "teddy bear", "beer", "diaper" },
new List<string>() { "beer", "diaper", "car" },
new List<string>() { "banana", "video game", "car" }
};
var itemsPairsOccurences = purchases
// cross join every purchase with itself and produce items pairs, i.e.: correlate every item included in a purchase with every item from the same purchase
.SelectMany(purchase1 => purchase1.SelectMany(_ => purchase1, (item1, item2) => Tuple.Create(item1, item2)))
// filter out pairs containing the same items (e.g. banana-banana) and duplicate pairs (banana-car remains, car-banana is skipped)
.Where(itemsPair => string.CompareOrdinal(itemsPair.Item1, itemsPair.Item2) < 0)
// group all pairs from all purchases by unique pairs and count occurrences
.GroupBy(itemsPair => itemsPair, (itemsPair, allItemsPairs) => KeyValuePair.Create(itemsPair, allItemsPairs.Count()));
// choose a pair with the most occurrences
var mostFrequent1 = itemsPairsOccurences.Aggregate((itemsPair1, itemsPair2) => itemsPair1.Value > itemsPair2.Value ? itemsPair1 : itemsPair2);
// OR order pairs by occurencces and select the first one
var mostFrequent2 = itemsPairsOccurences.OrderByDescending(itemsPair => itemsPair.Value).FirstOrDefault();