Home > database >  c# How split duplicate items into different lists
c# How split duplicate items into different lists

Time:11-02

I have a list of items not in order. This list is a big list(more than 10,000). Consider example :

{1,4,1,2,6,7,8,8,9,10,11,12,13,14 }

How to split the list into lists of size n (say n=4 for this case) that have no duplicates. For duplicate items, we will put it into different list.

Output:

list1 = {1,4,2,6}
list2 = {1,7,8,9 }
list3 = {8,10,11,12}
list4 = {13, 14}

CodePudding user response:

According to Dirichlet's box principle you are only able to do so if the maximum occurrence of any number is <= n.

As about the actual solution. Provided that we fit into the box principle's boundaries, a simple solution is to loop the numbers and put each number into the first box where it was not yet present.

If we need to try to ensure equity, then we can move numbers from one box to another as long as it is not already present there. So, these would be in principle.

Algorithm

for each item of number do
    index <- 0
    while index < n do
        if not box[n].contains(item) then
            put item into box[n]
            index <- n
        else
            index <- index   1
    end while
end for

CodePudding user response:

This can be done efficiently with fairly low allocations using a Queue, HashSet and iterator method. The premise is the HashSet uses the IEqualityComparer to determine equality. The Queue stores duplicates for later, the iterator... well iterates (and as such can be streamed).

Add pepper and salt to taste

public static IEnumerable<T[]> SomeFunkySplit<T>(IEnumerable<T> source, int size)
{
   var set = new HashSet<T>(size);
   var queue = new Queue<T>(source);
   while (queue.Any())
   {
      var item = queue.Dequeue();
      if (!set.Add(item)) queue.Enqueue(item);
      if (set.Count < size) continue;
      yield return set.ToArray();
      set.Clear();
   }

   if(set.Any()) yield return set.ToArray();
}

Usage

var array = new[] {1, 4, 1, 2, 6, 7, 8, 8, 9, 10, 11, 12, 13, 14};
var results = SomeFunkySplit(array,4);

foreach (var result in results)
   Console.WriteLine(string.Join(", ",result));

Output

1, 4, 2, 6
7, 8, 9, 10
11, 12, 13, 14
1, 8

Update

How to do it just with Queue and HashSet?

If for some reason you want to return a vanilla list

public static List<T[]> SomeFunkySplit<T>(IEnumerable<T> source, int size)
{
   var list = new List<T[]>();
   var set = new HashSet<T>(size);
   var queue = new Queue<T>(source);
   while (queue.Any())
   {
      var item = queue.Dequeue();
      if (!set.Add(item)) queue.Enqueue(item);
      if (set.Count < size) continue;
      list.Add(set.ToArray());
      set.Clear();
   }

   if(set.Any()) list.Add(set.ToArray());

   return list;
}

CodePudding user response:

What you are looking for is what is call Chunking. First what you can do is remove duplicates:

// Import Linq
using System.Linq;

// Create the list
list = new List() {1,4,1,2,6,7,8,8,9,10,11,12,13,14 };

// Remove Duplicates
list = list.Distinct();

Now to do the Chunking with .NET6 you can use:

var chunkedLists = lists.Chunk(4);

But prior to .NET6 here is an implementation you can use:

public static class ListExtensions
{
    public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize) 
    {
        return source
            .Select((x, i) => new { Index = i, Value = x })
            .GroupBy(x => x.Index / chunkSize)
            .Select(x => x.Select(v => v.Value).ToList())
            .ToList();
    }
}
  •  Tags:  
  • c#
  • Related