Home > database >  Get all split options to k chunks
Get all split options to k chunks

Time:01-05

I need to split an array arr into k chunks where the union of all the chucks is arr and there in no same element in two chunks.

For example for

int[] arr = new int[] {1, 2, 3, 4, 5}; 
int k = 3;

I need to return all the possible splits:

[[1], [2], [3,4,5]]
[[1], [2,3], [4,5]]
[[1], [2,3,4], [5]]
[[1,2], [3], [4,5]]
[[1,2], [3,4], [5]]
[[1,2,3], [4], [5]]

How can I do that efficiently in C#?

CodePudding user response:

You have a combinatoric problem: given an array of n item you should sample k subarrays or, put it differently, k - 1 splits from n - 1:

  [A, B, C, D, E]    n items, n - 1 possible splits 
     ^     ^ 
     |     |         k - 1 splits from n - 1 avaialable
     |     |
  [A] [B, C] [D, E]  k chunks   

Note that we have standard combinatoric problem

k - 1 from n - 1 unordered without repetitions

Code for such sampling can be

private static IEnumerable<int[]> Samples(int take, int from) {
  if (take > from || take <= 0 || from <= 0)
    yield break;

  int[] array = Enumerable.Range(0, take).ToArray();

  for (bool agenda = true; agenda; ) {
    agenda = false;

    yield return array.ToArray();

    for (int i = array.Length - 1; i >= 0; --i) 
      if (array[i] < from - take   i) {
        agenda = true;

        array[i]  = 1;

        for (int j = i   1; j < array.Length;   j)
          array[j] = array[i]   j - i;

        break;
      }
  }
}

Having this sampling routine, we can implement splitting into chunks:

private static IEnumerable<T[][]> MyChunks<T>(T[] array, int take) {
  if (take > array.Length)
    yield break;

  foreach (var sample in Samples(take - 1, array.Length - 1)) {
    T[][] result = new T[take][];

    for (int i = 0, from = 0, to; i <= sample.Length;   i, from = to) {
      to = i < sample.Length ? sample[i]   1 : array.Length;

      result[i] = array
        .Skip(from)
        .Take(to - from)
        .ToArray();
    }

    yield return result;
  }
}

Demo:

var arr = new int[] { 1, 2, 3, 4, 5};
int k = 3;

string report = string.Join(Environment.NewLine, MyChunks(arr, k)
  .Select(chunk => "["   string.Join(", ", chunk
     .Select(item => $"[{string.Join(",", item)}]"))   "]"));

Console.Write(report);

Output:

[[1], [2], [3,4,5]]
[[1], [2,3], [4,5]]
[[1], [2,3,4], [5]]
[[1,2], [3], [4,5]]
[[1,2], [3,4], [5]]
[[1,2,3], [4], [5]]

CodePudding user response:

On way to solve such a problem is to divide and conquer. We could first solve how we compute the possible splits of an array if we only ever wanted to split into two sub arrays (k = 2).

I.e. our function, when given 1,2,3,4, should return 1|2,3,4, 1,2|3,4, and 1,2,3|4 where | marks the border between the left and right subarrays.

Note how the | starts at the left-most position (that still produces a non-empty split on the left) and gradually moves to the right, until no more non-empty split can be produced for the right.

A C# function that does this is shown below:

IEnumerable<(Memory<int>, Memory<int>)> PossibleSplits(
    Memory<int> xs) {
    for (var i = 1; i < xs.Length; i  )
        yield return (xs[..i], xs[i..]);
}

var splits = PossibleSplits(new[] { 1, 2, 3, 4, 5 }.AsMemory());

As you can see it returns a sequence of left/right tuples.

I'm using Memory so no new arrays are allocated when splitting the input data.

One nice property of this function is that it returns an empty sequence when the input array's length is smaller than 2.

So how do we split to an abritrary number of splits, not only 2? One trick is to recursively split the left side of a split again until the left side becomes to small to be split.

Below is a recursive version of the function above that does just that:

public static class Ext {
    public static IEnumerable<IEnumerable<Memory<int>>> PossibleSplits(
        this Memory<int> xs,
        int k) {
        if (k < 2) {
            yield return Enumerable.Repeat(xs, 1);
            yield break;
        }
        
        for (var leftMinLen = k - 1; leftMinLen < xs.Length; leftMinLen  ) {
            var (left, right) = (xs[..leftMinLen], xs[leftMinLen..]);
            foreach (var leftSplit in left.PossibleSplits(k - 1))
                yield return leftSplit.Append(right);
        }
    }
}

var splits = new[] { 1, 2, 3, 4, 5 }
    .AsMemory()
    .PossibleSplits(k: 3);

It's an extension method, just because I like them.

You asked for an efficient solution, but efficiency is relative. This solution is efficient in terms of...

  • ...memory because now new arrays are allocated (hush, we don't talk about all the enumerators created by this code);
  • ...laziness, because only requested values will be computed;
  • ...code size.

It's not the most efficient in terms of runtime speed, because of all the overhead enumerators introduce.

  • Related