Home > Enterprise >  Using Google Sheets, identify a grouping of rows that have a unique column value
Using Google Sheets, identify a grouping of rows that have a unique column value

Time:12-07

I'm trying to find a way to solve a problem like this: Given the following data in a Google Sheet:

Recipe       Ingredient   Sequence.   Waste
Chicken Soup Celery.      1.          5%
Chicken Soup Carrots.     2.          3%
Chicken Soup Stock.       3.          0%
Chicken Soup Chicken.     4.          0%
Beef Stew.   Beef.        1.          0%
Beef Stew.   Stock.       2.          0%
Beef Stew.   Spices.      3.          0%

This data set contains two recipes, with ingredients in a sequence, where each ingredient has a waste percentage.

I'd like to write a formula (or whatever) that will work across hundreds or more of recipes. It should:

  • Evaluate waste as the most important factor. If the waste values for a recipe are all the same (all ingredients have the same waste number), I don't need to look at it.
  • Identify (via highlight, separate sheet, separate table, whatever) the recipe lines that have different waste values from the other ingredients in that recipe.

So my final output would show recipe 1 (Chicken Soup) and all of its ingredients, but ignore recipe 2 (Beef Stew) because the ingredients for chicken soup have different waste values, as opposed to all the same values in Beef stew.

I've been looking at various UNIQUE functions, but can't quite get the range to operate across multiple rows when there are hundreds of recipes.

I've worked out logic:

  • Given a Recipe Column with the same values,
  • Evaluate each row of this recipe grouping
  • Evaluate the Waste column of this row grouping
  • If all the waste columns for all the recipe rows are the same - no need to see it
  • If any of the waste columns for all of the recipe rows are different... output each row of the recipe so it can be analyzed separately as a "checklist" to-do sheet.

CodePudding user response:

try:

=LAMBDA(a, d, ad, FILTER(ad, REGEXMATCH(a, TEXTJOIN("|", 1, 
 QUERY(QUERY(TRIM(SPLIT(FLATTEN(QUERY(QUERY({a, d, d&"×"}, 
 "select max(Col1) where Col2 >0 group by Col1 pivot Col3"),,9^9)), "×")), 
 "select Col2,count(Col2) group by Col2"), 
 "select Col1 where Col2>1", )))))(A41:A50, D41:D50, A41:D50)

enter image description here

CodePudding user response:

In case you need an option without REGEXMATCH, because of the possibility of having similar recipes with matching names (say for example "Chicken Soup" and "Spicy Chicken Soup" would be both gotten if Chicken Soup has differing values in percentages in only "Chicken Soup"), you could apply a filter like this:

=FILTER(A3:D,
  BYROW(A3:A,LAMBDA(each,
    COUNTIF(INDEX(REGEXEXTRACT(INDEX(UNIQUE(A3:A&"╬"&D3:D)),"(. )╬")),each)))>1
)

If you have a lot of data, you can try this that first filters the empty values and should be faster:

=LAMBDA(filtered,FILTER(filtered,
  BYROW(filtered,LAMBDA(each,
    COUNTIF(INDEX(REGEXEXTRACT(INDEX(UNIQUE(A3:A&"╬"&D3:D)),"(. )╬")),index(each,1,1))))>1
))(FILTER(A3:D,A3:A<>""))

enter image description here

  • Related