Split array in the best possible way with multiple functions-CodePudding

I am looking for an algorithm name, preferably a library that can take an array and split it between functions in the best way possible. I don't care about the complexity (it's for a very low data set). Recursive check would suffice.

A nice example would be array:

const list = [{gender: "female"}, {gender: "female"}, {gender: "female"}, {gender: "male"}]

So if we run it with the special function

specialFunc(list, [(val) => val === 'female', (val) => val === 'male']);

We would be getting this

[
 [{gender: "female"}, {gender: "female"}, {gender: "female"}],
 [{gender: "male"}]
]

Because this is the best possible split we can get.

However, if we run it by this function:

specialFunc(list, [(val) => !!val, (val) => val === 'male']);

I would be getting this:

[
 [{gender: "female"}, {gender: "female"}, {gender: "female"}],
 [{gender: "male"}]
]

"the best way possible" means that the number distance (of array length) between each array should be the lowest, and the number of records in each array should be the maximum possible.

I have searched npmjs and github a lot but couldn't find anything.

Thank you very very much!

CodePudding user response：

I think I understand these requirements. You have a number of predicate function which you want to use to group your items. Multiple predicates might return true for the same item, so there are various groupings available. You want to find a grouping that minimizes the variation in sizes of the results.

I don't find your examples very compelling. I will try my own. If your items are 8, 6, 7, 5, 3, 0, 9] and you have three predicates: (n) => n < 7, (n) => n > 3, and (n) => n % 2 == 1, then the 8 can only go in the second group (it's greater than 3, but not less than 7 and not odd.) The 6 can go in either of the first two groups, the 5 could be in any of them, and so on, like this:

  8      6       7        5         3     0      9
[[1], [0, 1], [1, 2], [0, 1, 2], [0, 2], [0], [1, 2]]
  ^    ^  ^    ^  ^    ^  ^  ^    ^  ^    ^    ^  ^  
  |    |  |    |  |    |  |  |    |  |    |    |  |
  |     --|----|--|---- --|--|---- --|---- ----|--|------> Group 0 (n => n < 7)
  |       |    |  |       |  |       |         |  |
   ------- ---- --|------- --|-------|--------- --|------> Group 1 (n => n > 3)
                  |          |       |            |
                   ---------- ------- ------------ ------> Group 2 (n => n % 2 == 1)

Since there is one choice for the first, two for the second, two for the third, and so on, the number of possible partitions is 1 * 2 * 2 * 3 * 2 * 1 * 2, or 48. They might look like this:

[//     < 7          > 3      odd
  [ [6, 5, 3, 0], [8, 7, 9], [] ], 
  [ [6, 5, 3, 0], [8, 7],    [9] ], 
  [ [6, 5, 0],    [8, 7, 9], [3] ], 
  [ [6, 5, 0],    [8, 7],    [3, 9] ], 
  // ... (42 rows elided)  
  [ [0],          [8, 6, 9], [7, 5, 3] ], 
  [ [0],          [8, 6],    [7, 5, 3, 9] ]
]

Then, from these, we need to pick the ones with the smallest variation in partition size. We can use statistical variance for this¹, the sum of the squares of the distances of the values from their mean, so [[6, 5, 3, 0], [8, 7, 9], []], with lengths 4, 3, and 0; this has a variance of 8.667. The second one has lengths 4, 2, and 1 with a variance of 4.667. Our best possibility is 3, 2 and 2, with variance of 0.667. So an answer like [[6, 5, 0], [8, 7], [3, 9]] would be reasonable. There are probably quite a few with similar behavior; the implementation below simply chooses the first one.

If this correctly describes the problem, then here is some code that I think will handle it:

const range = (lo, hi) => Array .from ({length: hi - lo}, (_, i) => i   lo)

const sum = (ns) => ns .reduce ((a, b) => a   b, 0)

const filterMap = (f, m) => (xs) => 
  xs .flatMap ((x, i, a) => f (x, i, a) ? [m (x, i, a)] : [])

const cartesian = ([xs, ...xss]) =>
  xs == undefined
    ? [[]]
  : xs .flatMap (x => cartesian (xss) .map (ys => [x, ...ys]))

const variance = (ns, avg = sum (ns) / (ns .length || 1)) =>
  sum (ns .map (n => (n - avg) * (n - avg)))

const groupIndices = (count) => (xs) =>
  Object .values (xs .reduce (
    (a, x, i) => ((a [x] .push (i)), a), 
    Object .fromEntries (range (0, count) .map (n => [n, []]))
  ))

const specialFunc = (xs, preds) =>
  cartesian (xs .map ((x) => filterMap ((pred, i) => pred (x), (_, i) => i) (preds)))
    .map (groupIndices (preds .length))
    .reduce (
      ({min, v}, xs, _, __, v2 = variance (xs .map (x => x .length))) => 
          v2 < v ? {min: xs, v: v2} : {min, v}, 
      {min: [], v: Infinity}
    ) .min .map (ys => ys .map (i => xs [i]))

console .log (specialFunc (
  [8, 6, 7, 5, 3, 0, 9], 
  [n => n < 7, n => n > 3, n => n % 2 == 1]
)) //=> [[6, 5, 0], [8, 7], [3, 9]]

.as-console-wrapper {max-height: 100% !important; top: 0}

We start with some fairly standard utility functions. range calculates an integer range inclusive at the bottom, exclusive at the top, so that, for instance, range (3, 12) returns [3, 4, 5, 6, 7, 8, 9 ,10, 11]. sum simply totals an array of numbers, filterMap combines filtering with mapping, first testing whether an input matches a filter, and if so, transforming the result before putting it in the result. This implementation is unusual, in that the filter and mapping functions take more than just the value, but also the index and array properties found in things like map and filter. We need that as we'll use it to collect indices that match. (There are plenty of other ways to do that bit, but filterMap is a useful, reusable function.) cartesian returns the cartesian product of an array of arrays. For instance, cartesian ([1, 2, 3], [true], ['a', 'b']]) will return [[1, true, 'a'], [1, true, 'b'], [2, true, 'a'], [2, true, 'b'], [3, true, 'a'], [3, true, 'b']]. And finally variance calculate the statistical variance of a list of numbers.

Then we have a helper function, groupIndices. This might be easiest to show with an example. One of the 48 results from our cartesian product will be [1, 0, 1, 0, 2, 0, 1], which means that our original numbers (8, 6, 7, 5, 3, 0, 9], recall) are in groups 1, 0, 1, 0, 2, 0, and 1, respectively. groupIndices takes the number of groups and then takes that cartesian combination, and transforms it into [[1, 3, 5], [0, 2, 6], [4]], giving the indices of the values that are mapped to each group. (If I wasn't out of time, I'm sure we could skip this working with the indices and go directly against the values, but this works.)

Now we hit the main function, that I haven't tried to find a good name for, so is still called specialFunc. That uses filterMap to turn our list into [[1], [0, 1], [1, 2], [0, 1, 2], [0, 2], [0], [1, 2]], calls cartesian on the result, maps groupIndices over these values, then uses reduce to find (the first) one that is minimal in its variance. Finally it maps the resulting indices back to the actual values.

Again, we can probably clean this up and work with values not indices, but I would first want to know if this is the sort of behavior you're looking for.

^{¹The standard deviation has a clearer meaning, but as it's just the square root of the variance, it will be ordered the same way as the variance, and won't involve calculating square roots.}

CodePudding user response：

function splitGroups<T>(items: T[], filters: ((x: T) => boolean)[]) {
    let options = filters.map(f => items.filter(f));
    let groups = filters.map((_, index) => ({ data: [] as T[], index }));
    let res: T[][] = [];
    while (options.reduce((partial_sum, a) => partial_sum   a.length, 0) > 0) {
        groups.sort((a, b) => a.data.length - b.data.length);
        let smallGroup = groups[0];
        const smallGroups = groups.filter(g => g.data.length === smallGroup.data.length);
        if (smallGroups.length > 1) {
            smallGroup = smallGroups[Math.floor(Math.random() * (smallGroups.length - 1))];
        }
        if (options[smallGroup.index].length === 0) {
            res.push(smallGroup.data);
            groups = groups.filter(x => x !== smallGroup);
            continue;
        }
        const item = options[smallGroup.index][0];
        options = options.map(x => x.filter(y => y !== item));
        smallGroup.data.push(item);
    }
    res = [...res, ...groups.map(x => x.data)];
    return res;
}

function accurateSplitGroups<T>(items: T[], filters: ((x: T) => boolean)[], times: number) {
    const options: { data: T[][]; diff: number }[] = [];
    for (let i = 0; i < times; i  ) {
        const res = splitGroups(items, filters);
        let diffBetweenGroups = 0;
        const groupsLens = res.map(x => x.length);
        for (let i = 0; i < groupsLens.length; i  ) {
            for (let j = 0; j < groupsLens.length; j  ) {
                diffBetweenGroups  = Math.abs(groupsLens[i] - groupsLens[j]);
            }
        }
        options.push({ data: res, diff: diffBetweenGroups });
    }
    return options.sort((a, b) => a.diff - b.diff)[0].data;
}

const items = [{ gender: 'female' }, { gender: 'female' }, { gender: 'female' }, { gender: 'male' }];
const filters = [(x: any) => !!x.gender, (x: any) => !!x.gender];
const res = accurateSplitGroups(items, filters, 100);
const b = res;