I am trying to format the output of a report to allow me to import that data into another program. I have not found a way to manipulate the generated report into a more usable format.
The report is formatted with rows containing all the information for a certain product. If that product has additional codes/numbers that are associated with that product, they are listed in rows following the corresponding product. The problem is that those following rows are formatted completely different than the product rows they correspond with.
My goal is to concatenate all corresponding codes into Column 9, separated with commas.
A bonus would have all of the duplicates removed, but it would work even with the duplicates.
I honestly don't know if excel has the ability to achieve this, so any alternate ways to handle this would be welcome.
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 |
---|---|---|---|---|---|---|---|---|
53465 | 517 | Brand1 | 083758314483 | 08375831448 | 044800 | 044800 | Supp1 | |
88565 | 517 | Brand2 | 08375801599 | 08375801599 | 1599 | Supp2 | ||
083758015991 | 5032501599 | Supp2 | ||||||
88566 | 517 | Brand2 | 08375801799 | 08375801799 | 1799 | Supp2 | ||
083758017995 | 83758317996 | Supp2 | ||||||
88567 | 517 | Brand2 | 08375801999 | 08375801999 | 1999 | Supp2 | ||
083758019999 | 5032501999 | Supp2 | ||||||
75239 | 517 | Brand2 | 83758420009 | 083758420009 | 322200 | Supp2 | ||
083758432163 | 83758432163 | Supp2 | ||||||
083758432187 | 83758432187 | Supp2 | ||||||
083758432279 | 83758432279 | Supp2 | ||||||
083758420009 | 83758432262 | Supp2 | ||||||
53478 | 517 | Brand3 | 083758298547 | 08375829854 | 085400 | 085400 | Supp2 | |
53479 | 517 | Brand3 | 083758298554 | 08375829855 | 085500 | 085500 | Supp2 |
Into this..
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 |
---|---|---|---|---|---|---|---|---|
53465 | 517 | Brand1 | 083758314483 | 08375831448 | 044800 | 044800 | Supp1 | |
88565 | 517 | Brand2 | 08375801599 | 08375801599 | 1599 | Supp2 | 083758015991,5032501599 | |
88566 | 517 | Brand2 | 08375801799 | 08375801799 | 1799 | Supp2 | 083758017995,83758317996 | |
88567 | 517 | Brand2 | 08375801999 | 08375801999 | 1999 | Supp2 | 083758019999,5032501999 | |
75239 | 517 | Brand2 | 83758420009 | 083758420009 | 322200 | Supp2 | 083758432163,83758432163,083758432187,83758432187,083758432279,083758432279,083758420009,83758432262 | |
53478 | 517 | Brand3 | 083758298547 | 08375829854 | 085400 | 085400 | Supp2 | |
53479 | 517 | Brand3 | 083758298554 | 08375829855 | 085500 | 085500 | Supp2 |
CodePudding user response:
You can try the following formula in cell J2
:
=LET(input, A2:H15, colA, INDEX(input,,1), colDE, CHOOSECOLS(input, 4,5), colAA,
SCAN("",colA, LAMBDA(ac,item, IF(item<>"", item, ac))),
map, MAP(colA,LAMBDA(a, IF(a="", "",
LET(blankRows, FILTER(colDE, (colAA=a)*(colA="")),join,TEXTJOIN(",",,blankRows),
IFERROR(join,""))))),
FILTER(HSTACK(input, map), colA<>"")
)
Notes:
- We use several
LET
calls to avoid repetition the same element in the formula. - If you don't have
CHOOSECOLS
available yet ([Office Insider Beta only], Windows: 2203 (Build 15104), Mac: 16.60 (220304). You can use instead:FILTER(input, {0,0,0,1,1,0,0,0})
or just to specify the range:D2:E15
. I prefer to have less dependency on range specific information so the formulation is more robust. That is why I deduce it from a general variable (input
) that contains the range.
Explanation
The main idea is to fill the blanks of any of the columns so we can search by a given column value. I am assuming Col1 has unique values, so it is good candidate. The name: colAA
, has the empty cells filled with previous value from Col1:
SCAN("",colA, LAMBDA(ac,item, IF(item<>"", item, ac)))
Note: It is assumed the first value of Col1 is never empty.
Now we use MAP
(but also BYROW
can be used) to find the content of columns: Col5, Col6 (columns D
, E
from the screenshot), that correspond to a given value of Col1. If the value is empty, then we return an empty string, otherwise we find the elements of the name colDE
that corresponds to a given code (remember colAA
has all the values filled) and the original colA
has empty values. Once we have target subset (blankrows
), we just need to concatenate the result delimited by ,
via TEXTJOIN
function. The name map
contains this result:
MAP(colA,LAMBDA(a, IF(a="", "", LET(blankRows,
FILTER(colDE, (colAA=a)*(colA="")),
join,TEXTJOIN(",",,blankRows),
IFERROR(join,"")))))
In the case colA
value is not empty but doesn't have additional blank rows, the FILTER
function returns #CALC!
(empty set, i.e. error) so we need to treat this special case. According to the sample data Col9 for this case is empty, so we return empty string.
The rest is just to build the final result via HSTACK
and remove empty rows to avoid duplicates.