I have a dataset with the following structure
email category
[email protected] action
[email protected] horror
[email protected] romance
[email protected] sci-fi
[email protected] action
[email protected] action
[email protected]
[email protected]
[email protected] sci-fi
[email protected] horror
[email protected]
and am trying to produce a column 'sum_category' that would group by email to get the sum of unique category values for each unique email like this
email category sum_category
[email protected] action 3
[email protected] horror 1
[email protected] romance 2
[email protected] sci-fi 3
[email protected] action 2
[email protected] action 1
[email protected] 1
[email protected] 1
[email protected] sci-fi 1
[email protected] horror 3
[email protected] 3
Assuming Column A is 'Email' and Column C is 'category', I have tried using the following code but it is producing #DIV/0! error
SUMPRODUCT((A$2:A$100=A2)/COUNTIFS(C$2:C$100,C$2:C$100,A$2:A$100,A$2:A$100))
I have also tried using the following code but it is counting the blank values into the sum
SUMPRODUCT((A$2:A$100=A2)/COUNTIFS(C$2:C$100,C$2:C$100&"",A$2:A$100,A$2:A$100))
Appreciate any help I could get to resolve this, thank you.
CodePudding user response:
I presume you don't have Office 365, else this would be straightforward, for example using the following spill formula:
=LET(
α, A2:A12,
β, C2:C12,
BYROW(α, LAMBDA(ζ, COUNTA(UNIQUE(FILTER(β, (α = ζ) * (β <> ""))))))
)
Otherwise, don't use the reciprocal COUNTIF
/SUMPRODUCT
set-up: it's incredibly slow compared to the following FREQUENCY
/MATCH
construction, in D2
:
=SUM(
IF(FREQUENCY(
IF(A$2:A$12 = A2,
IF(C$2:C$12 <> "", MATCH(C$2:C$12, C$2:C$12, 0))
),
ROW(A$2:A$12) - MIN(ROW(A$2:A$12)) 1
),
1
)
)
and then copied down to D12
.