Home > OS >  Get highest frequency of combination of two different values in a list in C#
Get highest frequency of combination of two different values in a list in C#

Time:05-24

Preferably using System.linq

So in my program I have a list of Purchases which contain a list of Products. I am trying to determine which two products are more often purchased together in a single sale. For example:

Products - banana, apple, teddy bear, beer, diaper, video game, car

Purchases1 - banana, teddy bear, diaper, beer

Purchases2 - diaper, car, beer

Purchases3 - banana, video game, car

Most frequently bought together couple of products = diaper and beer.

Does anyone know the most optimal way of doing this? In practice my dictionary of Purchases has around 2.4 million elements in it and there 8013 unique products in the Products dictionary.

CodePudding user response:

I believe you need to create a Dictionary to keep track of your purchase pairs. Perhaps you could sort your purchase lists so that the entries are always in alphabetical order, then you can just iterate over the lists you have.

Below is a sample project I made that does what you are requesting, although the products are simply char. It may not be the most efficient, but it will at least show you a way to accomplish what you are asking.

    class Program
    {
        class Purchase{
            public List<char> Products = new List<char>();
        }

        static void Main(string[] args)
        {

            do
            {
                // here is our list of products
                char[] products = new char[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g' };

                // randomly populate a list of purchases
                Random random = new Random();
                List<Purchase> purchases = new List<Purchase>();

                for (int i = 0; i < 10000; i  )
                {
                    var currPurchase = new Purchase();

                    foreach (var p in products)
                        if (random.Next(0, 2) > 0)
                            currPurchase.Products.Add(p);
                    purchases.Add(currPurchase);
                }


                // for all of our products, we need to make a dictionary of all the possible pairs
                Dictionary<(char, char), int> count = new Dictionary<(char, char), int>();

                // now for each purchase, we go through and see all the pairs that happened, and update those dictionary entries
                foreach (var purchase in purchases)
                {
                    // get the pairs, update the pairs
                    for (int i = 0; i < purchase.Products.Count; i  )
                    {
                        for (int j = i   1; j < purchase.Products.Count; j  )
                        {
                            if (count.ContainsKey((purchase.Products[i], purchase.Products[j])))
                                count[(purchase.Products[i], purchase.Products[j])]  ;
                            else
                                count[(purchase.Products[i], purchase.Products[j])] = 1;
                        }
                    }
                }

                // then get the pair that had the highest frequency
                var highest = count.Max(kvp => kvp.Value);

                // and then get all the keys that occurred that many times (assuming we could have two pairs of equal frequency)
                var mostFrequent = count.Where(kvp => kvp.Value == highest).Select(x => x.Key);

                Console.WriteLine($"Most Frequent Pairs (Occurred {highest} times): ");
                foreach (var pair in mostFrequent)
                    Console.Write(pair   "; ");

                //type quit and hit enter to quit
            } while (Console.ReadLine() != "quit");

        }
    }

CodePudding user response:

here is my attempt using LINQ:

var purchases = new List<List<string>>
{
    new List<string>() { "banana", "teddy bear", "beer", "diaper" },
    new List<string>() { "beer", "diaper", "car" },
    new List<string>() { "banana", "video game", "car" }
};

var itemsPairsOccurences = purchases
    // cross join every purchase with itself and produce items pairs, i.e.: correlate every item included in a purchase with every item from the same purchase
    .SelectMany(purchase1 => purchase1.SelectMany(_ => purchase1, (item1, item2) => Tuple.Create(item1, item2)))
    // filter out pairs containing the same items (e.g. banana-banana) and duplicate pairs (banana-car remains, car-banana is skipped)
    .Where(itemsPair => string.CompareOrdinal(itemsPair.Item1, itemsPair.Item2) < 0)
    // group all pairs from all purchases by unique pairs and count occurrences
    .GroupBy(itemsPair => itemsPair, (itemsPair, allItemsPairs) => KeyValuePair.Create(itemsPair, allItemsPairs.Count()));

// choose a pair with the most occurrences
var mostFrequent1 = itemsPairsOccurences.Aggregate((itemsPair1, itemsPair2) => itemsPair1.Value > itemsPair2.Value ? itemsPair1 : itemsPair2);
// OR order pairs by occurencces and select the first one
var mostFrequent2 = itemsPairsOccurences.OrderByDescending(itemsPair => itemsPair.Value).FirstOrDefault();
  • Related