this might be a stupid question but I'm blanking at the moment and can't find the answer in google.
I have the following example table
customer | bought |
---|---|
A | food |
B | food,drink |
C | drink |
D | drink |
now how do I calculate the percentage of customers that bought food/drink over total customers? what would be the best way to calculate this?
solution
% food = #customers who bought food / #total unique customers = 2 / 4 = 50%
% drink = #customers who bought drink / #total unique customers = 3 / 4 = 75%
the problem here is the total % exceeds 100%solution - count customer B twice, once for drink and once for food
% food = #customers who bought food / #total customers = 2 / 5 = 40%
% drink = #customers who bought drink / #total customers = 3 / 5 = 60%
the total is 100% but is this the correct way to calc % in this case?
The issue is obviously that one customer can buy both food and drink and I'm not sure how to handle this case. Any help would be appreciated. Thank you!
UPDATE:
thanks for the answer. It makes sense to remove altogether the customers who bought both products. But now I'm wondering what happens if there are more than 2 categories?
Example as follows, now we have an additional product (ice cream) in the mix
customer | bought |
---|---|
A | food |
B | food,drink |
C | drink |
D | drink |
E | drink,ice cream |
F | ice cream |
G | food,drink,ice cream |
I guess with the same logic we can remove customer G since they bought all the products? And how should we handle customer E and B?
CodePudding user response:
I think that what you are looking for is simply to eliminate the number of customers that purchased both products.
If you are looking for unique combinations of customers who purchased only a specific product then:
3 unique customers bought a single unique item
- 1/3 food => 33%
- 2/3 drink => 66%
The ones who bought both products, don't matter in these calculations since they add the same percentage to both cases.
EDIT:
I used excel to help me out. I hope that is fine
I added your data into an excel sheet and created a crosstable pivot.
I've set the Customers as columns and the Products as rows, in order to see what each individual customer has purchased via the values field Count of Product
I've changed the count of Product into the formula for % of Total
in order to show me for each product, the percentage split between the customers, so for example if customer B bought both drink and food he will be shown as 50% in both corresponding rows. If he bought all 3 then 33%
The grand total column, will contains the final percentage for each product row item. Excel calculates it the same way as states in some comments, it counts the products total instead of the customers total, which makes sense since that is your data set being consumed, and not the customers themselves.
If we swap the rows and columns around and do the same calculations, we see individual percentages for each customer (how much % of product they each individually purchased) and we can sum them up for each product. The result is the same