I have rows of data like the following:
Cat A>Subcat A|Cat A>Subcat B
Cat A>Subcat C|Cat B>Subcat A|Cat B>Subcat C|Cat C
You'll notice that it's basically a list of parent categories and subcategories separated by a pipe |
I am needing to extract data from each row in two following ways:
- Get all parent categories and separate them by a pipe
|
(removing duplicates). - Get all subcategory names and separate them by a pipe
|
(removing duplicates).
From those first two provided rows, the result should look like:
String | Parents (Result 1) | Children (Result 2) |
---|---|---|
Cat A>Subcat A|Cat A>Subcat B | Cat A | Subcat A|Subcat B |
Cat A>Subcat C|Cat B>Subcat A|Cat B>Subcat C|Cat C | Cat A|Cat B|Cat C | Subcat C|Subcat A |
I've been able to achieve partial results using REGEXEXTRACT
and JOIN
but it either matches only once, or returns multiple. Example:
# Returns the first instance of "Cat A" only
=REGEXEXTRACT(H2,"(.*?)>.*?\|")
I'm looking to get help creating two regex patterns that can get the desired "Result 1" and "Result 2"
CodePudding user response:
try:
=ARRAYFORMULA(REGEXREPLACE(TRIM(SUBSTITUTE(TRANSPOSE(QUERY(QUERY(QUERY(
IFNA(SPLIT(UNIQUE(FLATTEN(ROW(A1:A2)&"×"&
REGEXREPLACE(REGEXEXTRACT(SPLIT(A1:A2&">", "|", 1), "(.*)>"), "(>.*)", )))&"|", "×")),
"select max(Col2)
where Col1 is not null
group by Col2
pivot Col1"),
"offset 1", 0),,9^9)), "| ", "|")), "\|$", ))
and:
=ARRAYFORMULA(REGEXREPLACE(TRIM(SUBSTITUTE(TRANSPOSE(TRIM(QUERY(QUERY(QUERY(
IFNA(SPLIT(UNIQUE(FLATTEN(ROW(A1:A2)&"×"&
REGEXREPLACE(REGEXEXTRACT(SPLIT(A1:A2, "|", 1), ">(.*)"), "(>.*)", )))&"|", "×")),
"select max(Col2)
where Col1 is not null
group by Col2
pivot Col1"),
"offset 1", 0),,9^9))), "| ", "|")), "\|$", ))