I have a table where each row represents a user and each column represents a service that the customer may have hired. I need to count how many customers hired 1 service, how many hired 2 and so on. It doesn't matter what service you hire. And there is no identifier column.
In Python I was able to do this with
result = services.count(axis = 1).value_counts()
result = pd.DataFrame(result, columns = ['n_clients'])
where 'result' is the csv with the database.
The result, in this case, are like:
n_client | |
---|---|
1 | 928459 |
2 | 280235 |
3 | 53731 |
4 | 16042 |
Edit: an example of the database to clarify:
product1 | product2 | product3 | product4 |
---|---|---|---|
True | |||
True | True | True | |
True | |||
True | True | ||
True | |||
True | |||
True | True |
In this case, the result should look like:
n_client | |
---|---|
1 | 4 |
2 | 2 |
3 | 1 |
4 | 0 |
CodePudding user response:
It looks like you want to just calculate how many products per row, then group by that number
SELECT
v.CountProducts
n_client = COUNT(*)
FROM YourTable
CROSS APPLY (VALUES (
CASE WHEN product1 = 'True' THEN 1 ELSE 0 END
CASE WHEN product2 = 'True' THEN 1 ELSE 0 END
CASE WHEN product3 = 'True' THEN 1 ELSE 0 END
CASE WHEN product4 = 'True' THEN 1 ELSE 0 END
) ) v(CountProducts)
GROUP BY
CountProducts;