Column A contains texts. Each text has values in the neighboring cells. Now I want to check for each row whether the other rows also contain these values. Restriction: At least three values must match and regardless of the order in the row. The rows, which belong together because of at least 3 common values, should be grouped. A group name should be assigned to them dynamically. An example table might look like this: INPUT
Word | Value01 | Value02 | Value03 | Value04 |
---|---|---|---|---|
banana | A | B | C | D |
sweet banana | L | N | O | J |
cheap banana | B | D | A | F |
best banana | D | C | A | H |
banana banana | N | L | I | O |
how to make banana bread | O | A | I | K |
I love banana | B | C | A | D |
banana bread | I | O | A | C |
EXPECTED OUTPUT
Word | Value01 | Value02 | Value03 | Value04 | group name |
---|---|---|---|---|---|
banana | A | B | C | D | banana group 1 |
sweet banana | L | N | O | J | banana group 2 |
cheap banana | B | D | A | F | banana group 1 |
best banana | D | C | A | H | banana group 1 |
banana banana | N | L | I | O | banana group 2 |
how to make banana bread | O | A | I | K | banana group 3 |
I love banana | B | C | A | D | banana group 1 |
banana bread | I | O | A | C | banana group 3 |
I tried the following formula:
=ArrayFormula(IF(B1:E1=B1:E8,"banana group 1","banana group 2"))
But the problems are, only the exact match is recognized and I am limited to two group names. Here is a link to the example table: Example Table
CodePudding user response:
I thought I would take a look at all similarities. But you could get only three if you change just one line. I will point it out below:
function sortgroups() {
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName("Sheet0");
const vs = sh.getRange("A2:E9").getValues();
let obj = { pA: [] };
vs.forEach((r, i) => {
let a = [r[1], r[2], r[3], r[4]];
obj[r[0]] = { chars: a, sub: [] };
obj.pA.push(r[0]);
vs.forEach((s, j) => {
if (j != i) {
let b = [s[1], s[2], s[3], s[4]];
let m = [];//matches
b.forEach((c, k) => {
let idx = a.indexOf(c);
if (~idx) {
m.push(c);
}
})
if (m.length == 3) {//this is the line to change to limit similarites
obj[r[0]].sub.push({ p: s[0], chars: m.join(',') });
}
}
});
});
let r = [];
obj.pA.forEach((p, i) => {
r.push([p, obj[p].chars[0], obj[p].chars[1], obj[p].chars[2], obj[p].chars[3], '', '']);
obj[p].sub.forEach((o, j) => {
r.push(['', '', '', '', '', o.p, o.chars]);
})
})
r.unshift(["Words","Value1","Value2","Value3","Value4","Similar Words","Similarities"]);
sh.getRange(sh.getLastRow() 2, 1, r.length, r[0].length).setValues(r);
}
Output:
Words | Value1 | Value2 | Value3 | Value4 | Similar Words | Similarities |
---|---|---|---|---|---|---|
banana | A | B | C | D | ||
cheap banana | B,D,A | |||||
best banana | D,C,A | |||||
how to make banana bread | A | |||||
I love banana | B,C,A,D | |||||
banana bread | A,C | |||||
sweet banana | L | N | O | J | ||
banana banana | N,L,O | |||||
how to make banana bread | O | |||||
banana bread | O | |||||
cheap banana | B | D | A | F | ||
banana | A,B,D | |||||
best banana | D,A | |||||
how to make banana bread | A | |||||
I love banana | B,A,D | |||||
banana bread | A | |||||
best banana | D | C | A | H | ||
banana | A,C,D | |||||
cheap banana | D,A | |||||
how to make banana bread | A | |||||
I love banana | C,A,D | |||||
banana bread | A,C | |||||
banana banana | N | L | I | O | ||
sweet banana | L,N,O | |||||
how to make banana bread | O,I | |||||
banana bread | I,O | |||||
how to make banana bread | O | A | I | K | ||
banana | A | |||||
sweet banana | O | |||||
cheap banana | A | |||||
best banana | A | |||||
banana banana | I,O | |||||
I love banana | A | |||||
banana bread | I,O,A | |||||
I love banana | B | C | A | D | ||
banana | A,B,C,D | |||||
cheap banana | B,D,A | |||||
best banana | D,C,A | |||||
how to make banana bread | A | |||||
banana bread | A,C | |||||
banana bread | I | O | A | C | ||
banana | A,C | |||||
sweet banana | O | |||||
cheap banana | A | |||||
best banana | C,A | |||||
banana banana | I,O | |||||
how to make banana bread | O,A,I | |||||
I love banana | C,A |
Only Three:
Words | Value1 | Value2 | Value3 | Value4 | Similar Words | Similarities |
---|---|---|---|---|---|---|
banana | A | B | C | D | ||
cheap banana | B,D,A | |||||
best banana | D,C,A | |||||
sweet banana | L | N | O | J | ||
banana banana | N,L,O | |||||
cheap banana | B | D | A | F | ||
banana | A,B,D | |||||
I love banana | B,A,D | |||||
best banana | D | C | A | H | ||
banana | A,C,D | |||||
I love banana | C,A,D | |||||
banana banana | N | L | I | O | ||
sweet banana | L,N,O | |||||
how to make banana bread | O | A | I | K | ||
banana bread | I,O,A | |||||
I love banana | B | C | A | D | ||
cheap banana | B,D,A | |||||
best banana | D,C,A | |||||
banana bread | I | O | A | C | ||
how to make banana bread | O,A,I |
CodePudding user response:
There must be a easier way to do this, but this is one (expensive) way to do this. SORT
each of row in ascending order, then get all the permutations of each row(for 4 items, that is 24 permutations per row). Check if any permutation match any of the previous row. If it matches, take the group it belongs to(F:F
). If not, create a new group based on row number.
F2:
=ARRAYFORMULA(IFNA(INDEX(F$1:F1, MATCH( true,BYROW(B$1:E1,LAMBDA(r, OR(BYROW(REDUCE(TRANSPOSE(SORT(TRANSPOSE(B2:E2))),SEQUENCE(PERMUT(4,4)-1),LAMBDA(a,c,{a;GET_NEXT_LEX(INDEX(a,c,0))})),LAMBDA(perm, SUMPRODUCT(perm=r)>=3 ))))),0)),"group#"&ROW()))
Drag fill down. It gets more expensive based on number of rows.
For this to work, you need to install GET_NEXT_LEX
function and it's dependencies from here.
Word | Value01 | Value02 | Value03 | Value04 | output |
---|---|---|---|---|---|
banana | A | B | C | D | group#2 |
sweet banana | L | N | O | J | group#3 |
cheap banana | B | D | A | F | group#2 |
best banana | D | C | A | H | group#2 |
banana banana | N | L | I | O | group#3 |
how to make banana bread | O | A | I | K | group#7 |
I love banana | B | C | A | D | group#2 |
banana bread | I | O | A | C | group#7 |