Home > Blockchain >  Aggregating rows with query in google sheets
Aggregating rows with query in google sheets

Time:12-26

I have a data set that looks something like this:

Column A Column B
category 1 Team 1
1.category 1 Team 1
2.category 2 Team 1
category 2 Team 1
category 3 Team 1
3.category 3 Team 1

I am trying to use query function with a pivot statement to calculate the occurrence of each category for team 1 (I have several other teams in the data set, but for simplicity I just wrote out my example with team 1). Unfortunately the naming of the categories are not consistent in the original data, and I cannot change them.

So I need a way to combine the results of the sum of category 1 and 1.category1, and so on.

How could I handle rewrite this to get the type of result as listed below?

Category Team 1
category 1 2
category 2 2
category 3 2

The formula I have now is as following:

query('sheet1!A:B,"Select A, count(B) where B='Team 1' group by A pivot B label B 'Team 1'",1)

CodePudding user response:

If the category names all have a similar format to those in your example (with extraneous data only at the beginning, followed by 'category N', and you don't care if zero counts per category are left blank then a more compact approach then the previous answer is (for any number of teams/categories):

=arrayformula(query({regexextract(A2:A,"category. "),B2:B},"select Col1,count(Col1) where Col2 is not null group by Col1 pivot Col2 label Col1 'Category'",0))

CodePudding user response:

image

formula:

=ArrayFormula(
LAMBDA(DATA,CATEGORY,
 LAMBDA(RESULT,
  LAMBDA(RESULT,
   IF(RESULT="",0,RESULT)
  )(QUERY(SPLIT(TRANSPOSE(SPLIT(RESULT,"&")),"|"),"SELECT Col1,SUM(Col3) GROUP BY Col1 PIVOT Col2 LABEL Col1'Category'",0))
 )(
  JOIN("&",
   BYROW(CATEGORY,LAMBDA(CAT,
    JOIN("&",CAT&"|"&BYROW(TRANSPOSE(QUERY(DATA,"SELECT COUNT(Col1) WHERE lower(Col1) CONTAINS'"&CAT&"' PIVOT Col2",0)),LAMBDA(ROW,JOIN("|",ROW))))
   ))
  )
 )
)({ASC($A$2:$B$7)},{"category 1";"category 2";"category 3"})
)
  1. use ASC() to format all numbers-like values into number,
  2. use {} to create the match conditions,
  3. iterate the conditions with BYROW() and...
  4. use QUERY() with CONTAINS to COUNT matches of the given conditions,
  5. use TRANSPOSE() to turn the match results of each row sideway,
  6. change the results into string with JOIN(), this helps to modify the row and column arrangment,
  7. SPLIT() the data to create the correct array format we can use,
  8. use QUERY() to PIVOT the SUM of the COUNT result as our final output.

Another approch works in a slightly different concept:

=ArrayFormula(
LAMBDA(DATA,CAT,
 LAMBDA(DATA,
  LAMBDA(COLA,COLB,
   LAMBDA(COLA,
    LAMBDA(RESULT,
     IF(RESULT="",0,RESULT)
    )(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
   )(REGEXEXTRACT(COLA,JOIN("|",CAT)))
  )(INDEX(DATA,,1),INDEX(DATA,,2))
 )(ASC(DATA))
)($A$2:$B$7,{"category 1","category 2","category 3"})
)

We can modify the Category column of the input data with REGEXEXTRACT() before sending it into query, which in this case, do make the formula looks a bit cleaner.


Inspired by @The God of Biscuits 's answer, we can now get rid of the CAT variable, which makes the formula more elastic to fit into your condition.

This REGEXEXTRACT() will extract Category value from the 1st 'category' match found to the end of the 1st 'number' after it, with any spacing in between the two value.

=ArrayFormula(
LAMBDA(DATA,
 LAMBDA(COLA,COLB,
  LAMBDA(RESULT,
   IF(RESULT="",0,RESULT)
  )(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
 )(REGEXEXTRACT(LOWER(INDEX(DATA,,1)),"((?:category)(?:  ?)(?:[0-9]|[0-9]) )"),INDEX(DATA,,2))
)($A$2:$B)
)
  • Related